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ANTHRAX TOXIN FUSION PROTEINS AND USES THEREOF 



This application is in a continuation in part 
application of Serial No. 08/021,601 filed February 12, 1993. 

BACKGROUND OF THE INVENTION 

10 The targeting of cytotoxic or other moieties to 

specific cell types has been proposed as a method of treating 
diseases such as cancer. Various toxins including Diphtheria 
toxin and Pseudomonas exotoxin A have been suggested as 
potential candidate toxins for this type of treatment. A 

15 difficulty of such methods has been the inability to 

selectively target specific cell types for the delivery of 
toxins or other active moieties. 

One method of targeting specific cells has been to 
make fusion proteins of a toxin and a single chain antibody. A 

20 single-chain antibody (sFv) consists of an antibody light 
chain variable domain (V L ) and heavy chain variable domain 
(V H ) , connected by a short peptide linker which allows the 
structure to assume a conformation capable of binding to 
antigen. In a diagnostic or therapeutic setting, the use of 

25 an sFv may offer attractive advantages over the use of a 
monoclonal antibody (MoAb) . Such advantages include more 
rapid tumor penetration with concomitantly low retention in 
non-targeted organs (Yokota et al. Cancer Res 52:3402,1992), 
extremely rapid plasma and whole body clearance (resulting in 

30 high tumor to normal tissue partitioning) in the course of 
imaging studies (Colcher et al . Natl. Cancer Inst. 82: 1191, 
1990; Milenic et al . Cancer Res. 51:6363, 1991), and 
relatively low cost of production and ease of manipulation at 
the genetic level (Huston et al. Methods Enzymol . 2 03:46, 

35 1991; Johnson, S. and Bird, R. E. Methods Enzymol. 203:88, 

1991) . In addition, sFv- toxin fusion proteins have been shown 
to exhibit enhanced anti- tumor activity in comparison with 
conventional chemically cross -linked conjugates (Chaudhary et 
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al. Nature 339:394, 1989; Batra et al . Cell. Biol. 11:2200- 
2295, 1991) . Among the first sFv to be generated were 
molecules capable of binding haptens (Bird et al. Science 
242:423, 1988; Huston et al. Proc. Natl. Acad. Sci. USA 
5 85:5879, 1988), cell-surface receptors (Chaudhary et al., 

1989), and tumor antigens (Chaudhary et al. Proc. Natl. Acad. 
Sci. USA 87:1066, 1990; Colcher et al., 1990). 

The gene encoding an sFv can be assembled in one of 
two ways: (i) by de novo construction from chemically 

10 synthesized overlapping oligonucleotides, or (ii) by 

polymerase chain reaction (PCR) -based cloning of V L and V H 
genes from hybridoma cDNA. The main disadvantages of the 
first approach are the considerable expense involved in 
oligonucleotide synthesis, and the fact that the sequence of 

15 V L and V H must be known before gene assembly is possible. 

Consequently, the majority of the sFv reported to date were 
generated by cloning from hybridoma cDNA; nevertheless, this 
approach also has inherent disadvantages, because it requires 
availability of the parent hybridoma or myeloma cell line, and 

20 problems are often encountered when attempting to retrieve the 
correct V region genes from heterologous cDNA. For example, 
hybridomas in which the immortalizing fusion partner is 
derived from MOPC-21 may express a V L kappa transcript which 
is aberrantly rearranged at the VJ recombination site, and 

25 which therefore encodes a non- functional light chain (Cabilly 
Sc Riggs, 1985; Carroll et al., 1988). Cellular levels of this 
transcript may exceed that generated from the productive V L 
gene, so that a large proportion of the product on PCR 
amplification of hybridoma cDNA will not encode a functional 

30 light chain. A second disadvantage of the PCR -based method, 
frequently encountered by the inventors, is the variable 
success of recovering V H genes using the conditions so far 
reported in the literature, presumably because the number of 
mismatches between primers and the target sequence 

35 destabilizes the hybrid to an extent which inhibits PCR 
amplification. 

Thus, methods of targeting toxins to specific cells 
using single- chain antibodies methods have been difficult to 
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practice because of the difficulties in obtaining single chain 
antibodies and other targeting moieties. Also, none of the 
proposed treatment methods has been fully successful, because 
of the need to fuse the toxin to the targeting moiety, thus 
5 disrupting either the toxin function or the targeting 

function. Thus, a need exists for a means to target molecules 
having a desired activity to a specific cell population. 

Bacterial and plant protein toxins have evolved 
novel and efficient strategies for penetrating to the cytosol 

10 of mammalian cells, and this ability has been exploited to 
develop anti- tumor and anti-HIV cytotoxic agents. Examples 
include ricin and Pseudomonas exotoxin A (PE) chimeric toxins 
and immunotoxins. 

Pseudomonas exotoxin A (PE) is a toxin for which a 

15 detailed analysis of functional domains exists. The sequence 
is deposited with GenBank. Structural determination by X-ray 
diffraction, expression of deleted proteins, and extensive 
mutagenesis studies have defined three functional domains in 
PE: a receptor-binding domain (residues 1-252 and 365-399) 

20 designated la and lb, a central translocation domain (amino 
acids 253-364, domain II), and a carboxyl -terminal enzymatic 
domain (amino acids 400-613, domain III). Domain III 
catalyzes the ADP- ribosylation of elongation factor 2 (EF-2) , 
which results in inhibition of protein synthesis and cell 

25 death. Recently it was also found that an extreme carboxyl 

terminal sequence is essential for toxicity (Chaudhary et al . 
Proc. Natl. Acad. Sci . U.S.A. 87:308-312, 1990; Seetharam et 
al. J. Biol. Chem. 266:17376-17381, 1991). Since this 
sequence is similar to the sequence that specifies retention 

30 of proteins in the endoplasmic reticulum (ER) (Munro, S. and 
Pelham, H.R.B. Cell 48:899-907, 1987), it was suggested that 
PE must pass through the ER to gain access nc the cytosol . 
Detailed knowledge of the structure of PE has facilitated use 
of domains II, lb, and III (together designated PE40) in 

35 hybrid toxins and immunotoxins. 

Bacillus anthracis produces three proteins which 
when combined appropriately form two potent toxins , 
collectively designated anthrax toxin. Protective antigen 
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(PA, 82,684 Da (Dalton) {SEQ ID NOS: 3 and 4)) and edema 
factor (EF, 89,840 Da) combine to form edema toxin (ET) , while 
PA and lethal factor (LF, 90,237 Da (SEQ ID NOS: 1 and 2)) 
combine to form lethal toxin (LT) (Leppla, S.H. Alouf , J.E. 
5 and Freer, J. H., eds. Academic Press, London 277-302, 1991). 
ET and LT each conform to the AB toxin model, with PA 
providing the target cell binding (B) function and EF or LF 
acting as the effector or catalytic (A) moieties. A unique 
feature of these toxins is that LF and EF have no toxicity in 

10 the absence of PA, apparently because they cannot gain access 
to the cytosol of eukaryotic cells. 

The genes for each of the three anthrax toxin 
components have been cloned and sequenced (Leppla, 1991) . 
This showed that LF and EF have extensive homology in amino 

15 acid residues 1-300. Since LF and EF compete for binding to 
PA63, it is highly likely that these amino- terminal regions 
are responsible for binding to PA63 . Direct evidence for this 
was provided in a recent mutagenesis study (Quinn et al. JV 
Biol. Chem. 266:20124-20130, 1991) ; all mutations made within 

20 amino acid residues 1-210 of LF led to decreased binding to 
PA63 . The same study also suggested that the putative 
catalytic domain of LF included residues 491-776 (Quinn et 
al . , 1991). In contrast; the location of functional domains 
within the PA63 polypeptide is not obvious from inspection of 

25 the deduced amino acid sequence. However, studies with 

monoclonal antibodies and protease fragments (Leppla, 1991) 
and subsequent mutagenesis studies (Singh et al . J, Biol. 
Chew. 266:15493-15497, 1991) showed that residues at and near 
the carboxyl terminus of PA are involved in binding to 

30 receptor. 

PA is capable of bindiny to the surface of many 
types of cells. After PA binds to a specific receptor 
(Leppla, 1991) on the surface of susceptible cells, it is 
cleaved at a single site by a cell surface protease, probably 
35 furin, to produce an amino- terminal 19-kDa fragment that is 
released from the receptor/PA complex (Singh et al . J. Biol. 
Chem. 264:19103-19107, 1989). Removal of this fragment from 
PA exposes a high-affinity binding site for LF and EF on the 
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re ceptor -bound 63 - kDa carboxyl- terminal fragment (PA63) . The 
complex of PA63 and LF or EF enters cells and probably passes 
through acidified endosomes to reach the cytosol. 

Cleavage of PA occurs after residues 164-167, 
Arg-Lys-Lys-Arg. This site is also susceptible to cleavage by 
trypsin and can be referred to as the trypsin cleavage site. 
Only after cleavage is PA able to bind either EF or LF to form 
either ET or LT. 

Prior work had shown that the carboxyl terminal PA 
fragment (PA63) can form ion conductive channels in artificial 
lipid membranes (Blaustein et al. Proc. Natl. Acad. Sci. 
U.S.A. 86:2209-2213, 1989; Koehler, T. M. and Collier, R.J. 
Mol. Microbiol. 5:1501-1506, 1991), and that LF bound to PA63 
on cell surface receptors can be artificially translocated 
across the plasma membrane to the cytosol by acidification of 
the culture medium (Friedlander, A. M. J. Biol. Chem. 
261:7123-7126, 1986). Furthermore, drugs that block endosome 
acidification protect cells from LF (Gordon et al . *7. Biol. 
Chem. 264:14792-14796, 1989; Friedlander, 1986; Gordon et al. 
Infect. Iwmun. 56:1066-1069, 1988). The mechanisms by which 
EF is internalized have been studied in cultured cells by 
measuring the increases in cAMP concentrations induced by PA 
and EF (Leppla, S. H. Proc. Natl. Acad. Sci. U.S.A. 79:3162- 
3166, 1982; Gordon et al . , 1989). However, because assays of 
cAMP are relatively expensive and not highly precise, this is 
not a convenient method of analysis. Internalization of LF 
has been analyzed only in mouse and rat macrophages, because 
these are the only cell types lysed by the lethal toxin. 

SUMMARY OF THE INVENTION 
The present invention provides a nucleic acid 
encoding * fusion protein comprising a nucleotide sequence 
encoding the PA binding domain of the native LF protein and a 
nucleotide sequence encoding an activity inducing domain of a 
second protein. Also provided is a nucleic acid encoding a 
fusion protein comprising a nucleotide sequence encoding the 
translocation domain and LF binding domain of the native PA 
protein and a nucleotide sequence encoding a ligand domain 
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which specifically binds a cellular target. Proteins encoded 
by the nucleic acid of the invention, vectors comprising the 
nucleic acids and hosts capable of expressing the protein 
encoded by the nucleic acids are also provided. 
5 A composition comprising the PA binding domain of 

the native LF protein chemically attached to an activity 
inducing moiety is further provided. 

A method for delivering an activity to a cell is 
provided. The steps of the method include administering to 

10 the cell (a) a protein comprising the translocation domain and 
the LF binding domain of the native PA protein and a ligand 
domain and (b) a product comprising the PA binding domain of 
the native LF protein and a non-LF activity inducing moiety, 
whereby the product administered in step (b) is internalized 

15 into the cell and performs the activity within the cell. 

Characteristics unique to anthrax toxin are 
exploited to make novel cell-specific cytotoxins. A site in 
the PA protein of the toxin which must be proteolytically 
cleaved for the activity- inducing moiety of the toxin to enter 

20 the cell is replaced by the consensus sequence recognized by a 
specific protease. Thus, the toxin will only act on cells 
infected with intracellular pathogens which make that specific 
protease. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a graph of the percent to which mutant 
proteins are cleaved by purified HIV-1 protease. The mutant 
proteins include protective antigen (PA) mutated to include 
the HIV-1 protease cleavage site in place of the natural 

3 0 trypsin cleavage site. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 
Nucleic Acids 

Lethal Factor (LF) 
3 5 The present invention provides an isolated nucleic 

acid encoding a fusion protein comprising a nucleotide 
sequence encoding the PA binding domain of the native LF 
protein and a nucleotide sequence encoding an activity 
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inducing domain of a second protein. The LF gene and native 
LF protein are shown in SEQ ID NO: 1 and 2, respectively. The 
PA gene and native PA protein are shown in SEQ ID NO: 3 and 4, 
respectively • 

5 The second protein can be a toxin, for example 

Pseudomonas exotoxin A (PE) , the A chain of Diphtheria toxin 
or shiga toxin. The activity inducing domains of numerous 
other known toxins can be included in the fusion protein 
encoded by the presently claimed nucleic acid. The activity 

10 inducing domain need not be a toxin, but can have other 
activities, including but not limited to stimulating or 
reducing growth, selectively inhibiting DNA replication, 
providing a desired gene, providing enzymatic activity or 
providing a source of radiation. In any case, the fusion 

15 proteins encoded by the nucleic acids of the present invention 
must be capable of being internalized and capable of 
expressing the specified activity in a cell. A given LF 
fusion protein of the present invention can be tested for its 
ability to be internalized and to express the desired activity 

20 using methods as described herein, particularly in Examples 1 
* and 2 . 

An example of a nucleic acid of the invention 
comprises the nucleotide sequence defined in the Sequence 
Listing as SEQ ID NO: 5. This nucleic acid encodes a fusion 

25 of LF residues 1-254 with the two-residue linker " TR M and PE 
residues 401-602 (SEQ ID NO: 6). The protein includes a Met- 
Val-Pro- sequence at the beginning of the LF sequence. Means 
for obtaining this fusion protein are further described below 
and in Example 1 . 

30 A further example of a nucleic acid of this 

invention comprises the nucleotide sequence defined in the 
Sequence Listing as SEQ ID NO: 7. This nucleic acid encodes a 
fusion of LF residues 1-254 with the two-residue linker 11 TR " 
and PE residue^ 398-613. (SEQ ID NO: 8) The junction point 

35 containing the "TR" is the sequence LTRA and the Met- Val -Pro- 
is also present. This fusion protein and methods for 
obtaining it are further described below and in Example 2 . 
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Another example of the nucleic acid of the present 
invention comprises the nucleotide sequence defined in the 
Sequence Listing as SEQ ID NO: 9. This nucleic acid encodes a 
fusion of LF residues 1-254 with the two residue linker and 
5 PE residues 362-613- (SEQ ID NO: 10) This fusion protein is 
further described in Example 1. 

Alternatively, the nucleic acid can include the 
entire coding sequence for the LF protein fused to a non-LF 
activity inducing domain. Other LF fusion proteins of various 
10 sizes and methods of making and testing them for the desired 
activity are also provided herein, particularly in Examples 1 
and 2 . 

Protective Anti gen (PA) 

Also provided is an isolated nucleic acid encoding a 
15 fusion protein comprising a nucleotide sequence encoding the 
translocation domain and LF binding domain of the native PA 
protein and a nucleotide sequence encoding a ligand domain 
which specifically binds a cellular target. 

An example of a nucleic acid of this invention 
20 comprises the nucleotide sequence defined in the Sequence 

Listing as SEQ ID NO:ll. This nucleic acid encodes a fusion 
of PA residues 1-725 and human CD4 residues 1-178, the portion 
which binds to gpl20 exposed on HIV-1 infected cells (SEQ ID 
. NO: 12) . This fusion protein and methods for obtaining and 
25 testing fusion proteins are further described below and in 
Examples 3 , 4 and 5 . 

The PA fusion protein encoding nucleic acid provided 
can encode any ligand domain that specifically binds a 
cellular target, e.g. a cell surface receptor, an antigen 
30 expressed on the cell surface, etc. For example, the nucleic 
acid can encode a ligand domain that specifically binds to an 
HIV protein expressed on the surface of an HIV-infected cell. 
Such a ligand domain can be a single chain antibody which is 
expressed as a fusion protein as provided above and in 
35 Examples 3, 4 and 5. Alternatively, the nucleic acid can 

encode, for example, a ligand domain that is a growth factor, 
as provided in Example 3. 
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Although the PA encoding sequence of the nucleic 
acid encoding the PA fusion proteins of this invention need 
only include the nucleotide sequence encoding the 
translocation domain and LF binding domain of the native PA 
5 protein, the nucleic acid can further comprise the nucleotide 
sequence encoding the remainder of the native PA protein. Any 
sequences to be included beyond those required, can be 
determined based on routine considerations such as ease of 
manipulation of the nucleic acid, ease of expression of the 
10 product in the host, and any effect on translocation/ 
internalization as taught in the examples. 

Proteins 

Proteins encoded by the nucleic acids of the present 

15 invention are also provided. 
LF Fusion Proteins 

The present invention provides LF fusion proteins 
encoded by the nucleic acids of the invention as described 
above and in the examples. Specifically, fusions of the LF 

20 gene with domains II, lb, and III of PE can be made by 

recombinant methods to produce in- frame translational fusions. 
Recombinant genes (e.g., SEQ ID NOs: 5, 7 and 9) were 
expressed in Escherichia coli (E. coli) , and the purified 
proteins were tested for activity on cultured cells as 

25 provided in Examples 1 and 2. Certain fusion proteins are 

efficiently internalized via the PA receptor to the cytosol. 
These examples demonstrate that this system can be used to 
deliver many different polypeptides into targeted cells. 

Although specific examples of these proteins are 

30 provided, given the present teachings regarding the 

preparation of LF fusion proteins, other embodiments having 
other activity inducing domains can be practiced using routine 
skill. 

Using current methods of genetic manipulation, a 
35 variety of other activity inducing moieties (e.g., 

polypeptides) can be translated as fusion proteins with LF 
which in turn can be internalized by cells when administered 
with PA or PA fusion proteins. Fusion proteins generated by 
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this method can be screened for the desired activity using the 
methods set forth in the Examples and by various routine 
procedures. Based on the data presented here, the present 
invention provides a highly effective system for delivery of 
5 an activity inducing moiety into cells. 
PA fusion proteins 

The present invention provides PA fusion proteins 
encoded by the nucleic acids of the invention. Specifically 
fusions of PA with single chain antibodies and CD4 are 
10 provided. 

Using current methods of genetic manipulation, a 
variety of other ligand domains (e.g., polypeptides) can be 
translated as fusion proteins with PA which in turn can 
specifically target cells and facilitate internalization LF or 

15 LF fusion proteins. Based on the data presented here, the 
present invention provides a highly effective system for 
delivery of an activity inducing moiety into a particular type 
or class of cells . 

Although specific examples of these proteins are 

2 0 provided, given the present teachings regarding the 

preparation of PA fusion proteins, other embodiments having 
other ligand domains can be practiced using routine skill. 
The fusion proteins generated can be screened for the desired 
specificity and activity utilizing the methods set forth in 

25 the example and by various routine procedures. In any case, 
the PA fusion proteins encoded by the nucleic acids of the 
present invention must be able to specifically bind the 
selected target cell, bind LF or LF fusions or conjugates and 
internalize the LF f us ion/ conjugate . 

30 Conjugates 

A composition comprising the PA binding domain Df 
the native LF protein chemically attached to an activity 
inducing moiety is provided/ Such an activity inducing moiety 
is an activity not present on native LF. The composition can 

35 comprise an activity inducing moiety that is, for example, a 
polypeptide, a radioisotope, an antisense nucleic acid or a 
nucleic acid encoding a desired gene product . 
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Using current methods of chemical manipulation, a 
variety of other moieties (e.g., polypeptides, nucleic acids, 
radioisotopes, etc.) can be chemically attached to LF and can 
be internalized into cells and can express their activity when 
5 administered with PA or PA fusion proteins* The compounds can 
be tested for the desired activity and internalization 
following the methods set forth in the Examples. For example, 
the present invention provides an LF protein fragment 1-254 
(LFl-254) with a cysteine residue added at the end of LF1-254 
10 (LFl-254Cys) . Since there are no other cysteines in LF, this 
single cysteine provides a convenient attachment point through 
which to chemically conjugate other proteins or non-protein 
moieties. 
Vectors and Hosts 

15 A vector comprising the nucleic acids of the present 

invention is also provided. The vectors of the invention can 
be in a host capable of expressing the protein encoded by the 
nucleic acid. 

To express the proteins and conjugates of the 

20 present invention, the nucleic acids can be operably linked to 
signals that direct gene expression. A nucleic acid is 
"operably linked" when it is placed into a functional 
relationship with another nucleic acid sequence. For 
instance, a promoter or enhancer is operably linked to a 

25 coding sequence if it affects the transcription of the 

sequence. Generally, operably linked means that the nucleic 
acid sequences being linked are contiguous and, where 
necessary to join two protein coding regions, contiguous and 
in reading frame. 

3 0 The gene encoding a protein of the invention can be 

inserted into an "expression vector", ''cloning vector", or 
"vector, 9 terms which usually refer to plasmids or ether 
nucleic acid molecules that are able to replicate in a chosen 
host cell. Expression vectors can replicate autonomously, or 

35 they can replicate by being inserted into the genome of the 
host cell. Vectors that replicate autonomously will have an 
origin of replication or autonomous replicating sequence (ARS) 
that is functional in the chosen host cell(s). Often, it is 
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desirable for a vector to be usable in more than one host 
cell, e.g., in E. coli for cloning and construction, and in a 
mammalian cell for expression. 

The particular vector used to transport the genetic 
5 information into the cell is also not particularly critical. 
Any of the conventional vectors used for expression of 
recombinant proteins in prokaryotic or eukaryotic cells can be 
used. 

The expression vectors typically have a 

10 transcription unit or expression cassette that contains all 

the elements required for the expression of the DNA encoding a 
protein of the invention in the host cells. A typical 
expression cassette contains a promoter operably linked to the 
DNA sequence encoding the protein, and signals required for 

15 efficient polyadenylation of the transcript. The promoter is 
preferably positioned about the same distance from the 
heterologous transcription start site as it is from the 
transcription start site in its natural setting. As is known 
in the art, however, some variation in this distance can be 

20 accommodated without loss of promoter function. 

The DNA sequence encoding the protein of the 
invention can be linked to a cleavable signal peptide sequence 
to promote secretion of the encoded protein by the transformed 
cell. Additional elements of the vector can include, for 

25 example, selectable markers and enhancers. Selectable 
markers, e.g., tetracycline resistance or hygromycin 
resistance, permit detection and/or selection of those cells 
transformed with the desired DNA sequences (see, e.g., U.S. 
Patent 4, 704, 362) . 

30 Enhancer elements can stimulate transcription up to 

1,C00 fold from linked homologous or heterologous promoters. 
Mary enhancer elements derived from viruses have a broad host 
range and are active in a variety of tissues. For example, 
the SV40 early gene enhancer is suitable for many cell types. 

35 Other enhancer/promoter combinations that are suitable for the 
present invention include those derived from polyoma virus, 
human or murine cytomegalovirus, the long terminal repeat from 
various retroviruses such as murine leukemia virus, murine or 
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Rous sarcoma virus, and HIV. See, Enhancers and Eukaryotic 
Expression, Cold Spring Harbor Pres, Cold Spring Harbor, N.Y. 
1983, which is incorporated herein by reference. 

In addition to a promoter sequence, the expression 
5 cassette should also contain a transcription termination 
region downstream of the structural gene to provide for 
efficient termination. The termination region can be obtained 
from the same gene as the promoter sequence or can be obtained 
from a different gene. 

10 For more efficient translation in mammalian cells of 

the mRNA encoded by the structural gene, polyadenylation 
sequences are also commonly added to the vector construct. 
Two distinct sequence elements are required for accurate and 
efficient polyadenylation: GU or U rich sequences located 

15 downstream from the polyadenylation site and a highly 

conserved sequence of six nucleotides, AAUAAA, located 11-30 
nucleotides upstream. Termination and polyadenylation signals 
that are suitable for the present invention include those 
derived from SV40, or a partial genomic copy of a gene already 

20 resident on the expression vector. 

The vectors containing the gene encoding the protein 
of the invention are transformed into host cells for 
expression, "Transformation" refers to the introduction of 
vectors containing the nucleic acids of interest directly into 

25 host cells by well known methods. The particular procedure 

used to introduce the genetic material into the host cell for 
expression of the protein is not particularly critical. Any 
of the well known procedures for introducing foreign 
nucleotide sequences into host cells can be used. It is only 

30 necessary that the particular procedure utilized be capable of 
successfully introducing at least one gene into the host cell 
which is capable of expressing the gene. 

Transformation methods, which vary depending on the 
type of host cell, include electroporation; transfection 

35 employing calcium chloride, rubidium chloride calcium 

phosphate, DEAE-dextran, or other substances; microprojectile 
bombardment; lipofection; infection (where the vector is an 
infectious agent); and other methods. See, generally, 
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Sambrook et al., (1989) supra, and Current Protocols in 
Molecular Biology, supra. Reference to cells into which the 
nucleic acids described above have been introduced is meant to 
also include the progeny of such cells . 
5 There are numerous prokaryotic expression systems 

known to one of ordinary skill in the art useful for the 
expression of the antigen. E. coli is commonly used, and 
other microbial hosts suitable for use include bacilli, such 
as Bacillus subtilus, and other enterobacteriaceae, such as 

10 Salmonella, Serratia, and various Pseudomonas species. One 
can make expression vectors for use in these prokaryotic 
hosts; the vectors will typically contain expression control 
sequences compatible with the host cell (e.g., an origin of 
replication, a promoter) . Any number of a variety of well- 

15 known promoters can be used, such as the lactose promoter 

system, a tryptophan (Trp) promoter system, a beta- lactamase 
promoter system, or a promoter from phage lambda. The 
promoters will typically control expression, optionally with 
an operator sequence, and have ribosome binding site 

20 sequences, for example, for initiating and completing 
transcription and translation. If necessary, an amino 
terminal methionine can be provided by insertion of a Met 
codon 5' and in-frame with the codons for the protein. Also, 
the carboxy- terminal end of the protein can be removed using 

25 standard oligonucleotide mutagenesis procedures, if desired. 

Host bacterial cells can be chosen that are mutated 
to be reduced in or free of proteases, so that the proteins 
produced are not degraded. For Bacillus expression systems in 
which the proteins are secreted into the culture medium, 

30 strains are available that are deficient in secreted 
proteases . 

Mammalian cell lines can also be used as host cells 
for the expression of polypeptides of the invention. 
Propagation of mammalian cells in culture is per se well 
35 known. See, Tissue Culture, Academic Press, Kruse and 

Patterson, ed. (1973) . Host cell lines may also include such 
organisms as bacteria (e.g., E. coli or B. subtil is) , yeast, 
filamentous fungi, plant cells, or insect cells, among others. 
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Purification of Protein 

After standard transfection or transformation 
methods are used to produce prokaryotic, mammalian, yeast, or 
insect cell lines that express large quantities of the protein 
of the invention, the protein is then purified using standard 
techniques which are known in the art. See, e.g., Colley et 
al. (1989) J. Biol. Chem. 64: 17619-17622; and Methods in 
Enzymology, "Guide to Protein Purification", M. Deutscher, 
ed. Vol. 182 (1990). 

Standard procedures of the art that can be used to 
purify proteins of the invention include ammonium sulfate 
precipitation, affinity and fraction column chromatography, 
gel electrophoresis and the like. See, generally, Scopes, R., 
Protein Purification, Springer- Verlag, New York (1982) , and 
U.S. Pat- No. 4,512,922 disclosing general methods for 
purifying protein from recombinantly engineered bacteria. 

If the expression system causes the protein of the 
invention to be secreted from the cells, the recombinant cells 
are grown and the protein is expressed, after which the 
culture medium is harvested for purification of the secreted 
protein. The medium is typically clarified by centrifugation 
or filtration to remove cells and cell debris and the proteins 
can be concentrated by adsorption to any suitable resin such 
as, for example, CDP-Sepharose, asialoprothrombin-Sepharose 
4B, or Q Sepharose, or by use of ammonium sulfate 
fractionation, polyethylene glycol precipitation, or by 
ultrafiltration. Other means known in the art are equally 
suitable. Further purification of the protein can be 
accomplished by standard techniques, for example, affinity 
chromatography, ion exchange chromatography, sizing 
chromatography, or other protein purification techniques used 
tc obtain homogeneity The purified proteins are then used to 
produce pharmaceutical compositions, as described below. 

Alternatively, vectors can be employed that express 
the protein intracellularly, rather than secreting the protein 
from the cells. In these cases, the cells are harvested, 
disrupted, and the protein is purified from the cellular 
extract, e.gr., by standard methods. If the cell line has a 
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cell wall, then initial extraction in a low salt buffer may 
allow the protein to pellet with the cell wall fraction. The 
protein can be eluted from the cell wall with high salt 
concentrations and dialyzed. If the cell line glycosolates 
the protein, then the purified glycoprotein may be enhanced by 
using a Con A column. Anion exchange columns (MonoQ, 
Pharmacia) and gel filtration columns may be used to further 
purify the protein. A highly purified preparation can be 
achieved at the expense of activity by denaturing preparative 
polyacrylamide gel electrophoresis. 

Protein analogs can be produced in multiple 
conformational forms which are detectable under nonreducing 
chromatographic conditions. Removal of those species having a 
low specific activity is desirable and is achieved by a 
variety of chromatographic techniques including anion exchange 
or si2e exclusion chromatography. 

Recombinant analogs can be concentrated by pressure 
dialysis and buffer exchanged directly into volatile buffers 
(e.g., N-ethylmorpholine (NEM) , ammonium bicarbonate, ammonium 
acetate, and pyridine acetate) . In addition, samples can be 
directly freeze-dried from such volatile buffers resulting in 
a stable protein powder devoid of salt and detergents. In 
addition, freeze-dried samples of recombinant analogs can be 
efficiently resolubilized before use in buffers compatible 
with infusion (e.g., phosphate buffered saline). Other 
suitable buffers might include hydrochloride, hydrobromide, 
sulphate acetate, benzoate, malate, citrate, glycine, 
glutamate, and aspartate. 

Specific Embodiments 
Toxins Modified to Contain Intracellular Pathogen Protease 
Recognition sites 

One aspect of the invention exploits the fact that 
PA and other toxins must be proteolytically cleaved in order 
to acquire activity, in conjunction with the fact that some 
cells infected with an intracellular pathogen possess an 
active protease that has a relatively narrow substrate 
specificity (for example, HIV-infected cells). The protease 
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site found in the native toxin is replaced with an 
intracellular pathogen specific protease site. Thus, the 
protease in cells that are infected by the intracellular 
pathogen cleaves the modified toxin, which then becomes active 
5 and kills the cell. 

Intracellular pathogens that can be targeted by the 
products and methods of the present invention include any 
pathogen that produces a protease having a specific 
recognition site. Such pathogens can include prokaryotes 

10 (including rickettsia, Mycobacterium tuberculosis, etc.), 
mycoplasma, eukaryotic pathogens (e.g. pathogenic fungi, 
etc.), and viruses. One example of an intracellular pathogen 
that produces a specific protease is human immunodeficiency 
virus (HIV). The HIV-1 protease cleaves viral polyproteins to 

15 generate functional structural proteins as well as the reverse 
transcriptase and the protease itself. HIV-1 replication and 
viral infectivity are absolutely dependent on the action of 
the HIV-1 protease. 

An intracellular pathogen specific protease site can 

20 be introduced into any natural or recombinant toxin for which 
proteolytic cleavage is required for toxicity. For example, 
one can replace the anthrax PA trypsin cleavage site (R164- 
167) of PA with the HIV-1 protease site. Alternatively, the 
diphtheria toxin disulfide loop sequence (see O'Hare, et al. 

25 FEBS 273 (1, 2): 200-204 (Oct. 1990)) can be replaced with the 
HIV-l protease cleavage site in order to obtain a toxin 
specific to HIV-1 infected cells. Similarly, the normally 
occurring diphtheria toxin sequence at residues 191 194 
(Williams, et al. J. Biol. Chem. 265(33): 20673-20677 (1990)) 

30 can be replaced by an intracellular pathogen specific protease 
site such as the HIV-1 protease cleavage sequence. The 
DAB486-IL-2 fusion toxin of Williams and the improved DAB389 
IL-2 toxin are effective on HIV-1 infected cells, which 
express high levels of the IL-2 receptor. Williams, J. Biol. 

35 Chem. 265:20673. Addition of the HIV-1 protease cleavage site 
would provide a further degree of specificity. Similarly the 
botulinum toxin C2 toxin is like the anthrax toxin in 
requiring a cleavage within a native protein subunit (see 
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Ohishi and Yanagimoto, Infection and Immunity 60 (11): 4648- 
4655 (Nov. 1992))/ so it too can be made specific for cells 
infected by an intracellular pathogen such as HIV-1. 

In one embodiment of the invention, the protease 
5 site of PA is replaced by the site recognized by the HIV-1 
protease. The cellular protease that cleaves PA absolutely 
requires the presence of the Arg 164 and Arg 167 residues, 
because replacement of either residue yields a PA molecule 
which is not cleaved after binding to the cell surface, 

10 However, any PA substitution mutant which retains at least one 
Arg or Lys residue within residues 164-167 can be activated by 
treatment with trypsin. Because the PA63 fragments produced 
by trypsin digestion have a variety of different amino 
terminal residues, it is clear that there is not a strict 

15 constraint on the identity of the terminal residues. Klimpel, 
et al., Proc. Natl. Acad. Sci. 89:10277-10281 (1992). 

Replacement of residues 164-167 of PA with residues 
that match the HIV-1 protease recognition site can render 
exogenously added PA inactive on cells which do not possess 

20 the HIV-1 protease. However, those cells that do express the 
HIV-l protease (i.e., cells infected with HIV-1 or cells 
engineered to produce the protease) would cleave and thereby 
activate the mutant PA. The activated PA proteins can then 
bind and internalize cytotoxic fusion proteins, such as LF-PE, 

25 added exogenous ly. 

Based on extensive studies of the substrate 
specificity of the protease, several PA variants were designed 
and produced which relate to the invention. These are shown 
below, with the residues underlined between which the cleavage 

3 0 occurred. PA proteins which have been mutated to replace 

R164-167 with an amino acid sequence recognized by the HIV-1 

protease are referred to as "PAHIV." 

PAHIV# 1 QVSQNYPIVQNI 

PAHIV#2 NTATIMMORGNF 

35 PAHIV#3 TVSFNFPOITLW 

PAHIV#4 GGSAFNFPIVMGG 

The mutant proteins PAHIV# ( 1 4 ) were cleaved correctly by the 
HIV-i protease. 
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Table 1 shows the amino acids and their corresponding 
abbreviations and symbols. 

Table 1 



A 


Ala 


Alanine 


M 


Met 


Methionine 


C 


Cys 


Cysteine 


N 


Asn 


Asparagine 


D 


Asp 


Aspartic acid 


P 


Pro 


Proline 


E 


Glu 


Glutamic acid 


Q 


Gin 


Glutamine 


F 


Phe 


Phenylalanine 


R 


Arg 


Arginine 


G 


Gly 


Glycine 


S 


Ser 


Serine 


H 


His 


Histidine 


T 


Thr 


Threonine 


I 


lie 


Isoleucine 


V 


val 


Valine 


K 


Lys 


Lysine 


W 


Trp 


Tryptophan 


L 


Leu 


Leucine 


Y 


Tyr 


Tyrosine 



15 Preferably, the mutations at R164-167 of PA are 

accomplished by cassette mutagenesis, although other methods 
are feasible as discussed below. in summary, three pieces of 
DNA are joined together. The first piece has vector sequences 
and encodes the "front half" (5' end of the gene) of PA 

20 protein, the second is a short piece of DNA (a cassette) and 
encodes a small middle piece of PA protein and the third 
encodes the "back half" (3* end of the gene) of PA. The 
cassette contains codons for the amino acids that are required 
to complete the cleavage site for the intracellular pathogen 

25 protease. This method was used to make mutants in tae plasmid. 
pYS5 although other plasmids could be employed. 

Alternatively, the mutations can be accomplished by 
use of the polymerase chain reaction (PCR) and other methods 
as discussed below. PCR duplicates a segment of DNA many 

30 times, resulting in an amplification of that segment. The 

reaction produces enough of the segment of DNA so that it can 
be modified with restriction enzymes and cloned. During the 
reaction a synthetic oligonucleotide primer is used to start 
the duplication of the target DNA segment. Each synthetic 
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primer can be designed to introduce novel DNA sequences into 
the DNA molecule, or to change existing DNA sequences. 

Modification of Toxins to Broaden or Alter Target Cell 
5 Specificity 

Another aspect of the invention involves compounds and 
methods for broadening or changing the range of cell types 
against which a toxin is effective. For example, the lethal 
anthrax toxin, PA+LF, is acutely toxic to mouse macrophage 

10 cells, apparently due to the specific expression in these 
cells of a target for the catalytic activity of LF. Other 
cell types are not affected by LF. However, in the present 
invention, LF is used to construct cytotoxins having broad 
cell specificity. 

15 A detailed analysis of the domains of LF identified 

the amino- terminal 254 amino acids as the region that binds to 
PA63. Fusion proteins containing residues 1-254 of LF and the 
ADP-ribosylation domain of Pseudomonas exotoxin A (PE) were 
designed according to the invention. These fusion proteins 

20 are highly toxic to cultured cells, but only when PA is 
administered simultaneously. 

Synthesis of Genes that Encode Proteins of the Invention 
Genes that encode toxins having altered protease 
recognition sites or fusion proteins having a binding domain 

25 from one protein and an activity inducing domain of a second 
protein can be synthesized by methods known to those skilled 
in the art. As an example of techniques that can be utilized, 
the synthesis of genes encoding modified anthrax toxin 
subunits LF and PA are now described. 

30 The DNA sequences for native PA and LF are known. 

Knowledge of these DNA sequences facilitates the preparation 
of genes and can be used as a starting point to construct DNA 
molecules that encode mutants of PA and/or LF. The protein 
mutants of the invention are soluble and include internal 

35 amino acid substitutions. Furthermore, these mutants are 
purified from, or secreted from, cells that have been 
transfected or transformed with plasmids containing genes 
which encode these proteins. Methods for making 
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modifications, such as amino acid substitutions, deletions, or 
the addition of signal sequences to cloned genes are known. 
Specific methods used herein are described below. 

The gene for PA or LF can be prepared by several 
5 methods. Genomic and cDNA libraries are commercially 

available. Oligonucleotide probes, specific to the desired 
gene, can be synthesized using the known gene sequence. 
Methods for screening genomic and cDNA libraries with 
oligonucleotide probes are known. A genomic or cDNA clone can 
10 provide the necessary starting material to construct an 
expression plasmid for the desired protein using known 
methods . 

A protein encoding DNA fragment can be cloned by 
taking advantage of restriction endonuclease sites which have 

15 been identified in regions which flank or are internal to the 
gene. See Sambrook, et al., Molecular Cloning: A Laboratory- 
Manual 2d.ed. Cold Spring Harbor Laboratory Press (1989), 
"Sambrook" hereinafter. 

Genes encoding the desired protein can be made from 

20 wild- type genes constructed using the gene encoding the full 
length protein. One method for producing wild* type genes for 
subsequent mutation combines the use of synthetic 
oligonucleotide primers with polymerase extension on a mRNA or 
DNA template. This PCR method amplifies the desired 

25 nucleotide sequence. U.S. Patents 4,683,195 and 4,683,202 

describe this method. Restriction endonuclease sites can be 
incorporated into the primers. Genes amplified by PCR can be 
purified from agarose gels and cloned into an appropriate 
vector. Alterations in the natural gene sequence can be 

30 introduced by techniques such as in vitro mutagenesis and PCR 
using primers that have been designed to incorporate 
appropriate mutations . 

The proteins described herein can be expressed 
intracellular^ and purified, or can be secreted when 

35 expressed in cell culture. If desired, secretion can be 

obtained by the use of the native signal sequence of the gene. 
Alternatively, genes encoding the proteins of the invention 
can be ligated in proper reading frame to a signal sequence 
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other than that corresponding to the native gene. Though the 
PA recombinant proteins of the invention are typically 
expressed in B. anthracis, they can be expressed in other 
hosts, such as E. coli. 
5 The proteins of this invention are described by their 

amino acid sequences and by their nucleotide sequence, it 
being understood that the proteins include their biological 
equivalents such that this invention includes minor or 
inadvertent substitutions and deletions of amino acids that 

10 have substantially little impact on the biological properties 
of the analogs. In some circumstances it may be feasible to 
substitute rare or non- naturally occurring amino acids for one 
or more of the twenty common amino acids listed in Table 2. 
Examples include ornithine and acetylated or hydroxylated 

15 forms. See generally Stryer, L. , Biochemistry 3d ed. (1988). 

Alternative nucleotide sequences can be used to 
express analogs in various host cells. Furthermore, due to 
the degeneracy of the genetic code, equivalent codons can be 
substituted to encode the same polypeptide sequence. 

20 Additionally, sequences (nucleotide and amino acid) with 
substantial identity to those of the invention are also 
included. Identity in this sense means the same identity (of 
base pair or amino acid) and order (of base pairs or amino 
acids) . Substantial identity includes entities that are 

25 greater than 80% identical. Preferably, substantial identity 
refers to greater than 90% identity. More preferably, it 
refers to greater than 95% identity. 

Mutagenesis 

30 Mutagenesis can be performed to yield point mutations, 

deletions, or insertions to alter the specific regions of the 
genes described above. Point mutations can be introduced by a 
variety of methods including chemical mutagenesis, mutagenic 
copying methods and site specific mutagenesis methods using 

35 synthetic oligonucleotides. 

Cassette mutagenesis methods are conveniently used to 
introduce point mutations into the specified regions of the PA 
or LF genes. A double -stranded oligonucleotide region 
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containing alterations in the specified sequences of the gene 
is prepared. This oligonucleotide cassette region can be 
prepared by synthesizing an oligonucleotide with the sequence 
alteration in residues of the PA or LF gene, annealing to a 
5 primer, elongating with the large fragment of DNA polymerase 

and trimming with BstBI. This double- stranded oligonucleotide 
is ligated into the Bamhi/BstBI fragment from pYS5 and the 
PpuMI-BamHI fragment from pYS6 to produce an intact 
recombinant DNA. Other methods of producing the double 

10 stranded oligonucleotides and other recombinant DNA vectors 
can be practiced. 

Chemical mutagenesis can be performed using the M13 
vector system. A single strand Ml 3 recombinant DNA is 
prepared containing recombinant PA or LF DNA. Another Ml 3 

15 recombinant containing the same recombinant DNA but in double 
stranded form is used to prepare a deletion in the targeted 
region of the gene. This double stranded M13 recombinant is 
cleaved into a linear molecule with an endonuclease, 
denatured, and annealed with the single strand M13 

20 recombinant, resulting in a single strand gap in the target 
region of the PA or LF DNA. 

This gapped DNA M13 recombinant is then treated with a 
compound such as sodium bisulfite to deaminate the cytosine 
residues in the single strand DNA region to uracil. This 

25 results in limited and specific mutations in the single strand 
DNA region. Finally, the gap in the DNA is filled in by 
incubation with DNA polymerase, resulting in a U-A base pair 
to replace a G-C base pair in the in unmutated portion of the 
gene. Upon replication the new recombinant gene contains T-A 

30 base pairs, which are point mutations from the original 
sequence. Other forms of chemical mutagenesis are also 
available . 

Mutagenic copying of the PA or LF recombinant DNA can 
be carried out using several methods. For example, a single- 
3 5 stranded gapped DNA region is created as described above. 

This region is incubated with DNA polymerase I and one or more 
mutagenic analogs of normal ribonucleoside triphosphates. 
Copying of the single stranded region with the DNA polymerase 
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substitutes the mutagenic analogs as the single strand gap 
region is filled in. Transfection and replication of the 
resulting DNA results in production of some mutated 
recombinant DNAs for PA, LF, or EF which can then be selected 
5 by cloning. Other mutagenic copying methods can be used. 

Point mutations can be introduced into the specified 
regions of the PA or LF genes by methods using synthetic 
oligonucleotides for site-specific mutagenesis, PCR copying 
of the PA or LF genes is performed using oligonucleotide 

10 primers covering the specified target regions, and which 

contain modifications from the wild type sequence in these 
regions. The PA gene in a pYS5 vector can be PCR amplified 
using this method to result in mutations in the 164-167 
position. PCR amplification can also be used to introduce 

15 mutations in the target region of the LF gene. 

Synthetic oligonucleotide methods of introducing point 
mutations can be preformed using heteroduplex DNA. A M13 
recombinant DNA vector containing the PA or LF gene is 
prepared and a single- stranded M13 recombinant is produced. A 

20 single strand oligonucleotide containing an alteration in the 
specified target sequence for the PA or LF gene is annealed to 
the single strand M13 recombinant to produce a mismatched 
sequence. Incubation with DNA polymerase I results in a 
double -stranded M13 recombinant containing base pair 

25 mismatches in the specified region of the gene. This Ml 3 

recombinant is replicated in a host such as B. antiiracis or E. 
coli to produce both wild type and mutant M13 recombinants. 
The mutated Ml 3 recombinants are cloned and isolated. Other 
vector systems for mutagenesis involving synthetic nucleotides 

30 and heteroduplex formation can be applicable. 



Expression of P ro teins in P rokaryotic Cells 

In addition to the use of cloning methods in bacteria 
such as Bacillus anthracis for amplification of cloned 
35 sequences, it may be desirable to express the proteins in 
other proJcaryotes . It is possible to recover a functional 
protein from E. coli transformed with an expression plasmid 
encoding a PA or LF protein. Conveniently, the mutated PA 
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proteins of the invention were expressed in B. anthracis and 
the LF- fusion proteins were expressed in E. coli. 

Methods for the expression of cloned genes in bacteria 
are well known. See Sambrook. To optimize expression of a 
5 cloned gene in a prokaryotic system, expression vectors can be 
constructed which include a promoter to direct mRNA 
transcription termination. The inclusion of selection markers 
in DNA vectors transformed in bacteria are useful. Examples 
of such markers include the genes specifying resistance to 

10 arnpicillin, tetracycline, or chloramphenicol . 

See Sambrook, previously cited, for details concerning 
selection markers and promoters for use in bacteria such as 
E. coli. In an embodiment of this invention, pYS5 is a vector 
for the subcloning and amplification of desired gene sequences 

15 although other vectors could be used. 

Strains of Bacillus anthracis producing mutated protein (s) 

For PA protein production, B. anthracis strains cured 
of both pXOl and pX02 are preferred because they are 

20 avirulent. Examples of such strains are UM23C1-1 and 
UM44-1C9, obtained from Curtis Thorne, University of 
Massachusetts- Similar strains can be made by curing of 
plasmids, as described by P. Mikesell, et al., "Evidence for 
plasmid- mediated toxin production in Bacillus anthracis , " 

25 Infect. Immun. 39:371-376 (1983). 

See generally commonly assigned U.S. Patent 
Application Serial No. 08/042,745, filed April 5, 1993, 
incorporated by reference herein. 

3 0 Treatment Methods 

A method for delivering a desired activity to a cell 
is provided. The steps of the method include administering to 
the cell (a) a protein comprising the translocation domain and 
the LF binding domain of the native PA protein and a ligand 

35 domain, and (b) a product comprising the PA binding domain of 
the native LF protein and a non-LF activity inducing moiety, 
whereby the product administered in step (b) is internalized 
into the cell and performs the activity within the cell. 
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The method of delivering an activity to a cell can use 
a ligand domain that is the receptor binding domain of the 
native PA protein* Other ligand domains are selected for 
their specificity for a particular cell type or class of 
5 cells* The specificity of the PA fusion protein for the 

targeted cell can be determined using standard methods and as 
described in Examples 2 and 3 . 

The method of delivering an activity to a cell can use 
an activity inducing moiety that is a polypeptide, for example 

10 a growth factor, a toxin, an antisense nucleic acid, or a 
nucleic acid encoding a desired gene product. The actual 
activity inducing moiety used will be selected based on its 
functional characteristics, e.g. its activity. 

A method of killing a tumor cell in a subject is also 

15 provided. The steps of the method can include administering 
to the subject a first fusion protein comprising the 
translocation domain and LF binding domain of the native PA 
protein and a tumor cell specific ligand domain in an amount 
sufficient to bind to a tumor cell. A second fusion protein 

20 is also administered wherein the protein comprises the PA 

binding domain of the native LF protein and a cytotoxic domain 
of a non-LF protein in an amount sufficient to bind to the 
first protein, whereby the second protein is internalized into 
the tumor cell and kills the tumor cell. 

25 The cytotoxic domain can be a toxin or it can be 

another moiety not strictly defined as a toxin, but which has 
an activity that results in cell death. These cytotoxic 
moieties can be selected using standard tests of cytotoxicity, 
such as the cell lysis and protein synthesis inhibition assays 

30 described in the examples. 

The invention further provides a method of killing 
HIV-infected cells ir a subject. The method comprises the 
steps of administering to the subject a first fusion protein 
comprising the translocation domain and LF binding domain of 

35 the native PA protein and a ligand domain that specifically 
binds to an HIV protein expressed on the surface of an HIV- 
infected cell, in an amount sufficient to bind to an HIV- 
infected cell. The next step is administering to the subject 
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a second fusion protein comprising the PA binding domain of 
the native LF protein and a cytotoxic domain of a non-i-iF 
protein, in an amount sufficient to bind to the first protein, 
whereby the second protein is internalized into the HIV- 
5 infected cell and kills the HIV-infected cell, thereby 
preventing propagation of HIV. 

Although certain of the methods of the invention have 
been described as using LF fusion proteins, it will be 
understood that other LF compositions having chemically 
10 attached activity inducing moieties can be used in the 
methods . 

The fusion proteins and other compositions of the 
inventions can be administered by various methods, e.g., 
parenterally, intramuscularly or intrapertioneally . 

15 The amount necessary can be deduced from other 

receptor/ligand or antibody/antigen therapies. The amount can 
be optimized by routine procedures. The exact amount of such 
LF and PA compositions required will vary from subject to 
subject, depending on the species, age, weight and general 

20 condition of the subject, the severity of the disease that is 
being treated, the particular fusion protein of composition 
used, its mode of administration, and the like. Generally, 
dosage will approximate that which is typical for the 
administration of cell surface receptor ligands, and will 

25 preferably be in the range of about 2 /xg/kg/day to 2 
mg/kg/day . 

Depending on the intended mode of administration, the 
compounds of the present invention can be in various 
pharmaceutical compositions. The compositions will include, 

30 as noted above, an effective amount of the selected protein in 
combination with a pharmaceutically acceptable carrier and, in 
addition, can include other medicinal agents, pharmaceutical 
agents, carriers, adjuvants, diluents, etc. By 
"pharmaceutically acceptable" is meant a material that is not 

35 biologically or otherwise undesirable, i.e., the material can 
be administered to an individual along with the fusion protein 
or other composition without causing any undesirable 
biological effects or interacting in a deleterious manner with 
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any of the other components of the pharmaceutical composition 
in which it is contained. 

Parenteral administration, if used, is generally- 
characterized by injection, Injectables can be prepared in 
5 conventional forms, either as liquid solutions or suspensions, 
solid forms suitable for solution or suspension in liquid 
prior to injection, or as emulsions. A more recently revised 
approach for parenteral administration involves use of a slow 
release or sustained release system, such that a constant 
10 level of dosage is maintained. See, e.g., U.S. Patent No. 
3,710,795, which is incorporated by reference herein. 

Formulations and Administration 

Proteins of the invention such as PAHIV are typically 

15 mixed with a physiologically acceptable fluid prior to 

administration to a mammal such as a human. Examples of 
physiologically acceptable fluids include saline solutions 
such as normal saline, Ringer's solution, and generally 
mixtures of various salts including potassium and phosphate 

20 salts with or without sugar additives such as glucose. The 
proteins are administered parenterally with intravenous 
administration being the most typical route. Either a bolus 
of the protein in solution or a slow infusion can be 
administered intravenously. The choice of a bolus or an 

25 infusion depends on the kinetics, including the half -life, of 
the protein in the patient. An appropriate evaluation of the 
time for delivery of the protein is well within the skill of 
the clinician. 

Patients selected for treatment with PAHIV are 

3 0 infected with HIV- l and they may or may not be symptomatic. 
Optimally, the protein would be administered to an HIV-1 
infected person who is net yet symptomatic. The dosage range 
of a protein of the invention such as PAHIV is typically from 
about 5 to about 2 5 micrograms per kilogram of body weight of 

35 the patient. Usually, the dose is about 10 micrograms per 
kilogram of body weight of the patient. The dosage is 
repeated at regular intervals, such as weekly for ahout 4 to 6 
weeks. At that time the clinician may opt to evaluate the 
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patient's immune status, including immuno- tolerance to the 
PAHIV, to decide future treatment. 

The foregoing description and the following examples 
are offered primarily for purposes of illustration. It will 
5 be readily apparent to those skilled in the art that the 

operating conditions, materials, procedural steps and other 
parameters of the system described herein can be further 
modified or substituted in various ways without departing from 
the spirit and scope of the invention. For example, although 

10 human use has been discussed, veterinary use of the invention 
is also feasible. For instance, cats suffer from a so-called 
feline AIDS or feline immunodeficiency virus (FIV) . 
Protective antigen can be altered to include a protease 
cleavage site specific for FIV. Thus, the invention is not 

15 limited by the description and examples, but rather by the 
appended claims. 

EXAMPLE 1 

Fusions of Anthrax Toxin Lethal Facto r to the 

20 ADP-Ribosylation Domain of Pseudomonas Exotoxin 

Reagents and General Procedures 

Restriction endonucleases and DNA modifying enzymes 
were purchased from GIBCO/BRL, Boehringer Mannheim, or New 
England Biolabs. Low melting point agarose (Sea Plaque) was 

25 obtained from FMC Corp. (Rockland, ME) . Oligonucleotides were 
synthesized on a PGR Mate (Applied Biosystems) and purified on 
oligonucleotide purification cartridges (Applied Biosystems) . 
The PCR was performed with a DNA amplification reagent 
(GeneAmp) from Perkin- Elmer Cetus Instruments and a thermal 

30 cycler (Perkin -Elmer Cetus). The amplification involved 

denaturation at 94 °C for 1 min, annealing at 55°C for 2.5 min 
and extension at 72°C for 3 min, for 30 cycles. A final 
extension was run at 72°C for 7 min. For amplification of PE 
fragments, 10% formamide was added in the reaction mixture to 

35 decrease the effect of high GC content. DNA sequencing 

reactions were done using the Seguenase version 1.0 from U. S. 
Biochemical Corp. and DNA sequencing gels were made from Gel 
Mix 6 from GIBCO/BRL. [ 35 S] deoxyadenosine 5' -[a- 
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thio] triphosphate and L- [3,4,5- 3 H] leucine were purchased from 
Dupont-New England Nuclear. J774A.1 cells were obtained from 
American Type Culture Collection. Chinese Hamster Ovary (CHO) 
cells were obtained from Michael Gottesman (National Cancer 
5 Institute, National Institutes of Health) (ATCC CCL 61). 
Plasmid Construction 

Construction of plasmids containing LF-PE fusions was 
performed as follows. Varying portions of the PE gene were 
amplified by PCR, ligated in frame to the 3 'end of the LF 

10 gene, and inserted into the pVEX115 f+T expression vector 
(provided by V. K. Chaudhary, National Cancer Institute, 
National Institutes of Health) , To construct fusion proteins, 
the 3 '-end of the native LF gene {including codon 776 of the 
mature protein, specifying Ser) was ligated with the 5' -ends 

15 of sequences specifying varying portions of domains II, lb, 
and III of PE. The LF gene was amplified from the plasmid 
pLF7 (Robertson, D. L. and Leppla, S.H. Gene 44:71-78, 1986) 
by PCR using oligonucleotide primers which added Kpnl and MluT 
sites at the 5' and the 3 r ends of the gene, respectively. 

20 Similarly, varying portions of the PE gene (provided by David 
FitzGerald, National Cancer Institute, National Institutes of 
Health) were amplified by PCR so as to add MIuI and EcoRl 
sites at the 5' and 3' ends. The PCR product of the LF gene 
was digested with Kpnl and the DNA was precipitated. The LF 

25 gene was subsequently treated with Mlul. Similarly, the PCR 
products of PE amplification were digested with Mlul and 
EcoRI. The expression vector pVEX115 f+T was cleaved with 
Kpnl and Eco.RI separately and dephosphorylated . This vector 
has a T7 promoter, OmpA signal sequence, multiple cloning 

30 site, and T7 transcription terminator. All the above DNA 
fragments were purified from low-melting point agarose, a 
three -fragment ligation was carried out, and the product 
transformed into E. coli DH5a (ATCC 53868) . The four 
constructs described in this report have the entire LF gene 

35 fused to varying portions of PE. The identity of each 

construct was confirmed by sequencing the junction point using 
a Sequenase kit (U.S. Biochemical Corp, ) . For expression, 
recombinant plasmids were transformed into E. coli strain 



WO 94/18332 



PCT/US94/01624 



31 

SA2821 (provided by Sankar Adhya, National Cancer Institute, 
National Institutes of Health, which is a derivative of 
BL2KXDE3) (S'tudier, F. W. and Moffat t # B.A. J- Mol. Biol. 
189:113-150, 1986) . This strain has the T7 RNA polymerase 
5 gene under control of an inducible lac promotor and also 
contains the degP mutation, which eliminates a major 
periplasmic protease (Strauch et al. J. Bacteriol. 171:2689- 
2696, 1989). 

In the resulting plasmids, the LF-PE fusion genes are 

10 under control of the T7 promoter and contain an OmpA signal 
peptide to obtain secretion of the products to the periplasm 
so as to facilitate purification. The design of the PCR 
linkers also led to insertion of two non-native amino acids, 
Thr-Arg, at the LF-PE junction. The four fusions analyzed in 

15 this report contain the entire 776 amino acids of mature LF, 
the two added residues TR (Thr-Arg) , and varying portions of 
PE. In fusion FP33, the carboxyl- terminal end of PE was 
changed from the native REDLK (Arg-Glu-Asp-Leu-Lys) to LDER, a 
sequence that fails to cause retention in the ER (endoplasmic 

20 reticulum) . 

Expression and Purification of Fusion Proteins 

Fusion proteins produced from pNA2, pNA4, pNA23 and 
pNA33 were designated FP2 , FP4, FP23 and FP33 respectively. 
E. coli strains carrying the recombinant plasmids were grown 

25 in super broth (32 g/L Tryptone, 20 g/L yeast extract, 5 g/L 
NaCl, pH 7.5) with 100 /xg/ml of ampicillin with shaking at 
225 rpm at 37°C in 2-L cultures. When A 600 reached 0.8-1.0, 
isoprcpyl - 1 - thio /S -D- galactopyranoside was added to a final 
concentration cf 1 mM, and cultures were incubated an 

30 additional 2 hr. EDTA and 1 , 10 -o-pherianthrcline were added to 
5 mM and 0.1 mM respectively, and the bacteria were harvested 
by cen*: rifugation at 4000 x g for 15 min a: 4°C. For 
extraction of the periplasmic contents, cells were suspended 
in 75 ml of 20% sucrose containing 30 mM Tris and 1 mM EDTA, 

35 incubated at 0° for 10 min, and centrifuged at 8000 x g for 

15 min at 4°C. Cells were resuspended gently in 50 ml of cold 
distilled water, kept on ice for 10 min, and the spheroplasts 
were pelleted. The supernatant was concentrated with 
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Cent riprep- 100 units (Amicon) and loaded on a Sephacryl S-200 
column (40 x 2 cm) and 1 ml fractions were collected* 

Fractions having full length fusion protein as 
determined by immunoblots were pooled and concentrated as 
5 above. Protein was then purified on an anion exchange column 
(MonoQ HR5/5, Pharmacia -LKB) using a NaCl gradient. The 
fusion proteins eluted at 280-300 mM NaCl. The proteins were 
concentrated again on Centriprep-100 (Amicon Division) and the 
MonoQ chromatography was repeated. Protein concentrations 

10 were determined by the bicinchoninic acid method (BCA Protein 
Assay Reagent, Pierce), using bovine serum albumin as the 
standard. Proteins were analyzed by polyacrylamide gel 
electrophoresis in the presence of sodium dodecyl sulfate 
(SDS) . Gels were either stained with Coomassie Brilliant Blue 

15 or the proteins were electroblotted to nitrocellulose paper 
which was probed with polyclonal rabbit antisera to LF or PE 
(List Biological Laboratories, Campbell, CA) . To determine 
the percent of full length protein, SDS gels stained with 
Coomassie Brilliant Blue were scanned with a laser 

20 densitometer (Pharmacia -LKB Ultrascan XL) . 

The proteins migrated during gel electrophoresis with 
molecular masses of more than 106 kDa, consistent with the 
expected sizes, and immunoblots confirmed that the products 
had reactivity with antisera to both LF and PE. The fusion 

25 proteins differed in their susceptibility to proteolysis as 

judged by the appearance of smaller fragments on immunoblots, 
and this led to varying yields of final product. Thus, from 
2-L cultures the yields were FP2 , 27 FP4 , 87 M g: FP23, IP 

/ig; and FP33, 143 /xg . 

3 0 Cell Culture Techniques and Protein Synthesis inhibition Assay ; 

CHO cells were maintained as monolayers in Eagle's 
minimum essential medium (EMEM) supplemented with 10% feta". 
bovine serum, 10 mM 4 -2 (2 -hydroxy ethyl) - 1- 
piperazineethanesulf onic acid (HEPES) (pH 7.3), 2 mM 

35 glutamine, penicillin/streptomycin, and non-essential amino 

acids (GIBCO/BRL) . Cells were plated in 24- or 48 -well dishes 
one day before the experiment. After overnight incubation 
the medium was replaced with fresh medium containing l /xg/ml 
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of PA unless otherwise indicated. Fusion proteins were added 
to 0.1-1000 ng/ml. All data points were done in duplicate. 
Cells were further incubated for 20 hr at 37°C in 5% C0 2 
atmosphere. The medium was then aspirated and cells were 
5 incubated for 2 hr at 37°C with leucine- free medium containing 
1 /xCi/ml t 3 H] leucine. Cells were washed twice with medium, 
cold 10% trichloroacetic acid was added for 30 min, the cells 
were washed twice with 5% trichloroacetic acid and dissolved 
in 0.150 ml 0.1 M NaOH. Samples were counted in Pharmacia -LKB 

10 1410 liquid scintillation counter. In experiments to 

determine if the toxin is internalized through acidified 
endosomes, 1 fM monensin (Sigma) was added 90 min prior to 
toxin and was present during all subsequent steps. To verify 
that the fusion proteins were internalized through the PA 

15 receptor, competition with native LF was carried out. PA (0.1 
/*g/ml) and LF (0.1-10,000 ng/ml) were added to the CHO cells 
to block the PA receptor and the fusion proteins were added 
thereafter at concentrations of 100 ng/ml for FP4 and FP23 and 
5 ng/ml for FP33. Protein synthesis inhibition was measured 

20 after 20 hr as described above. 

Cytotoxic Activity of the Fusion Proteins 

All four fusion proteins made and purified were toxic 
to CHO cells. The concentration causing 50% lysis of cultured 
cells (EC 50 ) values of the proteins were 350, 8, 10, and 0.2 

25 ng/ml for FP2 , FP4 , FP23 and FP33 respectively (Table 1) . 

These assays were done with PA present at 1 ug/ml, exceeding 
the of 0.1 ug/ml (100 pM) . The fusion proteins had no 
toxicity even at 1 /xg/ml when PA was omitted, proving that 
internalization of the fusion proteins was occurring through 

30 the action of PA and the PA receptor. Native LF has 

previously been shown to have no short -term toxic effects on 
CHO cells when added with PA, and therefore was not included 
in these assays. The fusion protein having only domain III 
and an altered carboxyl - terminus (FP33) was most active, 

3 5 whereas the one having the intact domains II and III and the 
native REDLK terminus (FP2) was least active. The other two 
fusion proteins (FP4 and FP23) had intermediate potencies. 



WO 94/18332 



PCT/US94/01624 



34 

Among proteins having ADP-ribosylation activity, 
potencies equalling or exceeding 1 pM have previously been 
found only for native diphtheria and Pseudomonas toxins acting 
on selected cells (Middlebrook, J. L. and Dorian, R-B. Can. J. 
5 Microbiol. 23:183-189, 1977) and for fusion proteins of PE and 
diphtheria toxin when tested on cells containing > 100,000 
receptors for the ligand- recognition domain of the fusion 
(EGF, transferrin, etc.) (Pastan, I. and FitzGerald, D. 
Science 254:1173-1177, 1991; Middlebrook, et al. 1977). For 

10 CHO cells, the potency of FP33 (EC 50 = 2 pM) is higher than 
that of PE itself {EC 50 = 420 pM) , even though CHO cells 
probably have similar numbers of receptors for both PA and PE 
(approx. 5,000-20,000). If the intracellular trafficking of 
native PE delivers less than 5% of the molecules to the 

15 cytosol, then the 2 00 -fold greater potency of FP33 suggests 
that the PA/LF system has an inherently high efficiency of 
delivery to the cytosol. 

A comparison of the potencies of the four fusion 
proteins shows that inclusion of domain II decreases potency. 

20 Thus, the fusion with the lowest potency, FP2, was the one 

containing intact domains II, lb, and III. In designing the 
fusion proteins, all or part of PE domain II and lb was 
included in several of the constructs because it could not be 
assumed that the translocation functions possessed by PA and 

25 LF would be able to correctly traffic PE domain III to the 

cytosol. The combination of domains II, lb, and III, termed 
PE40, has been used in a large number of toxic hybrid 
proteins, by fusion to growth factors, monoclonal antibodies, 
and other proteins (Pastan et al . 1991; Oeltmann, T. N. and 

30 Frankel, A. E. Faseb J. 5:2334-2337, 1991), and some of these 
fusions have shown substantial potency. Domain II was found 
to be essential in these hybrid proteins to provide a 
translocation function not present in the receptor - -binding 
domain to which it was fused. The potency of many of these 

3 5 PE40 fusion proteins appears to require that they be 

trafficked through the Golgi and ER and proteolytically 
activated in the same manner as native PE, so as to achieve 
delivery of domain III to the cytosol. The fact that 
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inclusion of the entire domain II in the LF fusion protein FP2 
instead decreased activity suggests that internalization of 
the LF fusions occurs through a different route, one that does 
not easily accommodate all the sequences in domain II. 
5 Evidence that structures within PE residues 251-278 

inhibit translocation of the LF fusions comes from the 35-fold 
lower potency of FP2 compared to FP23. One structure that 
might inhibit translocation of the fusions is the disulfide 
loop formed by Cys265 and Cys287. In native PE, this 

10 disulfide loop appears to be required for maximum activity. 

Thus, native PE and TGF-a-PE40 fusions become 10- to 100-fold 
less toxic if one or both these cysteines are changed to 
serine. The disulfide loop probably acts to constrain the 
polypeptide so that Arg276 and Arg279 are susceptible to the 

15 intracellular protease involved in the cleavage that precedes 
translocation. In contrast, the disulfide loop decreases the 
potency of the LF fusions, perhaps by preventing the unfolding 
needed for passage through a protein channel, thereby acting 
in this situation as a "stop transfer" sequence. FP23, which 

20 lacks Cys265, would not contain the domain II disulfide, and 
therefore would not be subject to this effect. LF, like PA 
and EF, contains no cysteines, and would not be prevented by 
disulfide loops from the complete unfolding needed to pass 
through a protein channel. The suggestion that disulfide 

25 loops act as stop- transfer signals would predict that the 

disulfide Cys372 -Cys379 in PE domain lb, which is retained in 
all four LF fusions would also decrease potency. It should be 
notc;d that neither the fusions made here nor the PE40 fusions 
have been analysed chemically to determine if the disulfides 

30 in domains II and III are actually formed. If the disulfides 
do form correctly, it would be predicted that the potencies of 
et II cf r,ha fusion proteins, and especially that: of FP2 , would 
be increased by treatment with reducing agents. These 
analyses have not yet been performed. This analysis also 

35 suggests that future LF fusions might be made more potent by 
omission of domain lb. 

The other structural feature of PE known to affect 
intracellular trafficking is the carboxyl terminal sequence, 
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REDLK, that specifies retention in the ER (Chaudhary et al ♦ 
1990; Muro et al . 1987). To determine if the trafficking of 
the LF fusion proteins was similar to that of PE, two of the 
fusion proteins were designed so as to differ only in the 
5 terminal sequence. Replacement of the native sequence by 
LDER, one that does not function as an ER retention signal, 
produced the most toxic of the four fusion proteins, FP33. 
FP4, identical except that it retained a functional REDLK 
sequence, was 30- fold less potent. These data suggest that 

10 sequestration of the REDLK-ended fusions decreased their 
access to cytosolic EF-2. The implication is that PE may 
require the REDLK terminus to be delivered to the ER for an 
obligatory processing step, but then be limited in its final 
toxic potential by sequestration from its cytosolic target. 

15 Finally, this comparison strongly argues that internalization 
of the LF fusions does not follow the same path as PE. 

In designing the fusion proteins described here it was 
hoped that they would have cytotoxic activity against cells 
that are unaffected by anthrax lethal toxin, and this was 

20 successfully realized as shown by the data obtained with CHO 
cells. However, prior knowledge about LF did not provide a 
basis for predicting whether the constructs would retain 
toxicity toward mouse macrophages, the only cells known to be 
rapidly killed by anthrax lethal toxin. Macrophages are lysed 

25 by lethal toxin in 90-120 minutes, long before any inhibition 
of protein synthesis resulting from ADP-ribosylation of EF-2 
leads to decreases in membrane integrity or viability. This 
kinetic difference made it possible to test directly for LF 
action. As discussed above, the fusion proteins purified to 

30 remove the ~ 89-kDa LF species formed by proteolysis were not 
toxic to J774A.1 macrophages. This shows that attachment of a 
bulky group to the carboxyl terminus of LF eliminates its 
normal toxic activity. In the absence of any assay for the 
putative catalytic activity of LF, it is not possible to 

35 determine the cause of the loss of LF activity. The inability 
of the fusions to lyse J774A.1 cells also argues against 
proteolytic degradation of the fusions either in the medium 
during incubation with cells or after internalization. 
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An important result of the invention described here is 
the demonstration that the anthrax toxin proteins constitute 
an efficient mechanism for protein internalization into animal 
cells. The high potency of the present fusion proteins argues 
5 that this system is inherently efficient, as well as being 

amenable to improvement. The high efficiency results in part 
from the apparent direct translocation from the endosome, 
without a requirement for trafficking through other 
intracellular compartments. In addition to its efficiency, 

10 the system appears able to tolerate heterologous polypeptides. 
Macrophage Lysis Assay of Fusion Proteins 

Fusion proteins were assayed for LF functional 
activity on J774A.1 macrophage cell line in the presence of 
l jig/ml PA. One day prior to use, cells were scraped from 

15 flasks and plated in 48-well tissue culture dishes. For 

cytotoxicity tests, the medium was aspirated and replaced with 
fresh medium containing 1 fig/ml PA and the LF fusion proteins, 
and the cells were incubated for 3 hr. All data points were 
performed in duplicate. To measure the viability of the 

20 treated cells, 3- [4,5-dimethylthiazol-2-yl] -2,5- 

diphenyltetrazolium bromide (MTT) was added to the cells to a 
final concentration of 0.5 mg/ml, and incubation was continued 
for an additional 45 min to allow the uptake and oxidation of 
MTT by viable cells. Medium was aspirated and replaced by 

25 200 fil of 0.5% SDS, 40 mM HC1, 90% isopropanol and the plates 
were vortexed to dissolve the blue pigment. The MTT 
absorption was read at 570 nm using a UVmax Kinetic Microplate 
Reader (Molecular Devices Corp.). 

The crude periplasmic extracts from which the fusion 

30 proteins were purified caused lysis of J774A.1 macrophages 
when added with PA, indicating the presence of active LF 
species, probably formed by proteolysis of r,he fusion 
proteins. Purification removed this activity, so that none of 
the final fusion proteins had this activity. This result 

35 showed both that the purified proteins were devoid of full 

size LF or active LF fragments, and that the lytic activity of 
LF for macrophages is blocked when residues from PE are fused 
at its carboxyl terminus. 
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ADP-Ribosylation Assays 

For assaying ADP- ribosylation activity, the method of 
Collier and Kandel (Collier, R. J. and Kandel, J. J- Biol. 
Chem. 246:1496-1503, 1971) was used with some modification, A 
5 wheat germ extract enriched for EF-2 was used in the reaction. 
Briefly, in a 200-jtiL reaction assay, 20 /iL of buffer 
(500 mM Tris, 10 mM EDTA, 50 iriM dithiothreitol and 
10 mg/ml bovine serum albumin) was mixed with 30 /iL of EF-2, 
130 fih of H 2 0 or sample, and 20 fih of [ adenylate- 32 P] NAD (0.4 

10 /iCi per assay, ICN Biochemicals) containing 5 jiM of non- 
radioactive NAD. Samples were incubated for 20 min at 23 °C, 
the reactions were stopped by adding 1 ml 10% trichloroacetic 
acid, and the precipitates were collected and washed on GA-6 
filters (Gelman Sciences). The filters were washed twice with 

15 70% ethanol, air dried, and the radioactivity measured. 

Table 1 shows that all the fusion proteins were 
equally capable of ADP- ribosylation of EF-2. FP2, which had 
little cytotoxic activity on CHO cells, still retained full 
ADP -ribosylation activity. It was also found that treatment 

20 with urea and dithiothreitol under conditions that activate 

the enzymatic activity of native PE, caused no increase in the 
ADP -ribosylation activity of the fusion proteins, suggesting 
that the proteins were not folded so as to sterically block 
the catalytic site. 

25 Effect of Mutant PA on LF-PE Activity 

To verify that uptake of the fusion proteins requires 
PA, the activity of the fusion proteins was measured in the 
presence of a mutant PA which is apparently defective in 
internalization. This mutant, PA-S395C, has a serine to 

30 cysteine substitution at residue 395 of the mature protein, 
and retains the ability to bind to receptor, become 
proteolytically nicked, and bind LF, but is unable to lyse 
macrophages. When PA-S395C was substituted for native PA in 
combination with FP33, no inhibition of protein synthesis 

35 inhibition was observed. Similar results were obtained when 
the other three fusion proteins were tested in combination 
with PA-S395C. 
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Effect of Monensin on Activity of the Fusion Proteins 

To verify that internalization of the fusion proteins 
was occurring by passage through acidified endosomes in the 
same manner as native LF, the ability of monensin to protect 
5 cells was examined. Addition of monensin to 1 ftM decreased 
the potency of FP33 by >100-fold. Protection against the 
other three fusion proteins exceeded 20 -fold. 
LF Block of LF- PE Fusion Activity 

To further verify that the fusion proteins were 

10 internalized through the PA receptor, CHO cells were incubated 
with PA and different amounts of LF to block the receptor and 
the fusion proteins were added thereafter. Protein synthesis 
inhibition assays showed that native LF could competitively 
block LF-PE fusion proteins in a concentration- dependent 

15 manner. 

The present data suggest that the receptor -bound 63- 
kDa proteolytic fragment of PA forms a membrane channel and 
that regions at or near the amino- termini of LF and EF enter 
this channel first and thereby cross the endosomal membrane, 

20 followed by unfolding and transit of the entire polypeptide to 
the cytosol. This model differs from that for diphtheria 
toxin in that the orientation of polypeptide transfer is 
reversed. Since both EF and LF have large catalytic domains, 
extending to near their carboxyl termini, it appears probable 

25 that the entire polypeptide crosses the membrane. In the LF 
fusion proteins, the attached PE sequences would be carried 
along with the LF polypeptide in transiting the channel to the 
cytosol. Thus, the PA63 protein channel must tolerate diverse 
amino acid residues and sequences. The data presented is 

30 consistent with the mechanism of direct translocation of the 
LF proteins to the cytosol as suggested herein. 
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TABLE 1 Cytotoxic and catalytic activity of LF-PE fusion 
proteins 



15 



Prot 
-ein 


Amino acid content 


Toxicity 
(EC 50 ) b 


ADP- 

KXJjOSyXa.tlOIl 

activity 
(relative) 


LF 


Link 
er 


PE 


(pM) 


ng/ 
ml 


PE 


none 


none 


1-613 


420 


23 


100 c 


PP2 


776 


TR 


251-613 


2700 


350 


Q O 


FP4 


776 


TR 


362-613 


65 


8 


105 


FP23 


776 


TR 


279-613 


70 


10 


108 


FP33 


776 


TR 


362-612 a 


2 


0.2 


118 



a REDLK at carboxyl terminus is changed to LDER. 
b Data is from this example, except for native PE, which is 
20 from data not shown, and is equal to a value previously 

reported (Moehring, T. J. and Moehring, J. M. Cell 11:447-454, 
1977) . 

c ADP-ribosylation was measured using 30 ng of fusion protein 
in a final volume of 0.200 ml with 5 fM NAD. Results were 
25 corrected for the molecular weights of the proteins and 
normalized to PE. 
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EXAMPLE 2: Residues 1-254 of Anthrax Toxin Letha l Factor are 
Sufficient to Cause Cellular Uptake of Fused Poly peptides 
Reagents and General Procedures 

Restriction endonucleases and DNA modifying enzymes 
5 were purchased from GIBCO/BRL, Boehringer Mannheim or New 

England Biolabs. Low melting point agarose (Sea Plaque) was 
obtained from FMC Corporation. Oligonucleotides were 
synthesized on a PCR Mate (Applied Biosystems) and purified 
with Oligonucleotide Purification Cartridges (Applied 

10 Biosystems) . Polymerase chain reactions (PCR) were performed 
on a thermal cycler (Perkin-Elrner-Cetus) using reagents from 
U. S. Biochemical Corp, or Perkin-Elmer-Cetus . DNA was 
amplified as described in Example 1. The DNA was sequenced to 
confirmed the accuracy of all of the constructs described in 

15 the report. SEQUENASE version 2.0 from U. S. Biochemical 
Corp- w ^s utilized for the sequencing reactions, and DNA 
sequencing gels were made with Gel Mix 8 from GIBCO/BRL. 
[ 35 S]dATPaS and L- [3 , 4 , 5 - 3 H] leucine were purchased from 
Dupont-New England Nuclear. Chinese hamster ovary cells (CHO) 

20 were obtained from Michael Gottesman (NCI, NIH) • J774A. 1 
macrophage cells were obtained from American Type Culture 
Collection. 
Plasmid Construction 

Three types of LF protein constructs were made and 

25 analyzed in this report. All the constructs were made by PCR 
amplification of the desired sequences, using the native LF 
gene as template. LF proteins deleted at the amino- or 
carboxyl- terminus were constructed by a single PCR 
amplification reaction that added restriction sites at the 

30 ends for incorporation of the construct into the expression 
vector. LF proteins deleted for one or more of the 19 -amino 
acid repeats that comprise residues 308-383 were constructed 
by ligating the products of two separate PCR reactions that 
amplified the regions bracketing the deletion. The third 

35 group of constructs were fusions of varying portions of the 
amino terminus of LF with PE domains lb and III. Like the^ 
internally- deleted LF proteins, these LF-PE fusions were also 
made by ligation of two separate PCR products. In the latter 
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two types of constructs, the ligation of the PGR products 
resulted in addition of a linker, ACGCGT, at the junction 
points. This introduced two non-native residues, Thr-Arg, 
between the fused domains. The PGR manipulations also added 
5 three non-native amino acids, Met-Val-Pro, as an extension to 
' the native amino terminus on all the constructs described in 
this report. Addition of this sequence is not likely to alter 
the activity of the constructs (discussed below) . It should 
be noted that the LF-PE fusions described herein contain this 

10 three-residue extension. 

For PCR reactions to make deletions of 40 and 78 amino 
acids from the amino- terminus of LF, two different mutagenic 
oligonucleotide primers were made which were substantially 
identical to the LF gene template at the intended new termini, 

15 and which added Kpnl sites at their 5' -ends. Another 

(non- mutagenic) oligonucleotide primer for introduction of a 
BamHI site at the 3' end of LF was prepared. Similarly, to 
make deletions at the carboxyl - terminus of LF, two different 
mutagenic primers were used which truncated LF at residues 729 

20 and 693 and introduced a BamHI site next to the new 3' ends of 
the LF gene. A second (non-mutagenic) oligonucleotide primer 
specific for the amino terminus of LF was made which 
introduced a Kpnl site at the 5' end of the gene. All of the 
primers noted above were used in PCR reactions on a pLF7 

25 template (Robertson and Leppla, 1986) to synthesize DNA 

fragments having Kpnl and BamHI sites at their 5' and 3' ends, 
respectively. The amplified LF DNAs containing the amino- and 
carboxyl- terminal deletions were digested with the appropriate 
restriction enzymes . The expression vector pVEX115f +T 

3 0 (provided by V. K. Chaudhary, NCI, NIH) was cleaved 

sequentially with Kpnl and BamHI and dephosphorylated . This 
expression vector contains a T7 promoter, an OmpA signal 
sequence for protein transport to the periplasm, a multiple 
cloning site that includes Kpnl and BamHI sites, and a T7 

35 transcription terminator. The LF and pVEX115f+T DNA fragments 
were purified from low melting point agarose, ligated 
overnight, and transformed into E. coli DH5a. Transf ormants 
were screened by restriction digestion to identify the desired 
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recombinant plasmids. Proteins produced by these constructs 
are designated according to the amino acid residues retained; 
for example the LF truncated at residue 693 is designated 
LF 1 " 693 . All of the mutant LF proteins described above contain 
5 three non-native amino acids, Met-Val-Pro, added to the amino- 
terminus as a result of the PCR manipulations. 

To analyze the role of the repeat region of LF, four 
different constructs were made: 1., removal of the entire 
repeat region (LF 1 " 307 .TR.LF 384 " 776 ) , 2., removal of the first 
10 repeat (LF 1-307 .TR.LF 327 " 776 ) , 3., removal of the last repeat 
(LF 1 - 364 .TR.LF 384 " 776 ) , and 4., removal of repeats 2-4 
(LF 1_326 ,TR.LF 384 " 776 ) . To construct LF 1 * 307 .TR..LF 384 " 776 , four 
different primers were used in two separate PCR reactions. To 
amplify LF 1 " 307 , one oligonucleotide primer was made at the 5'- 
15 end of the LF gene which added a Kpril site, and a second 

primer was constructed at the end of residue 307, introducing 
an Mlul site. For amplifying LF 384 * 776 , a third primer was 
made at residue 384 with an added Mlul site, and the fourth 
primer was made at the residue 776 which introduced a BamHI 
20 site at the end. Two PCR amplifications were done using 
primers one/two and three/four with pLF7 as template 
(Robertson and Leppla, 1986) . The first amplification 
reaction was digested with Kpnl and Mlul separately, and the 
second amplification reaction was digested with Mlul and 
25 BamHI. The expression vector pVEX115f+T was digested 

separately with Kpnl and BamHI and dephosphorylated. All 
three fragments were gel purified, ligated overnight at 16°C 
and transformed into E . coli DH5a. The other three constructs 
were made by similar strategies. Oligonucleotide primers one 
30 and four were the same for all four constructs; whereas 

primers two and three were changed accordingly* All four 
constructs contain Met~Val~Pro at the amino terminus of LF and 
Thr-Arg at the site of the repeat region deletion. 

To construct LF-PE fusion proteins, fragments of the 
35 LF gene extending from the amino terminus to various lengths 
were amplified from plasmid pLF7 (Robertson and Leppla, 198 6) 
by PCR using a common oligonucleotide primer that added a Kpnl 
site at the 5' end and mutagenic primers which added Mlul 
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sites at the intended new 3' ends. The PCR products of the LF 
gene were digested with Kpnl, the DNAs were precipitated, and 
subsequently digested with Mlul. Domains lb and III of the PE 
gene (provided by David FitzGerald, NCI, NIH) were amplified 
5 by PCR using primers which added Aflul and JScojRI sites at the 
5' and 3' ends, respectively. The PCR product of PE was 
digested with Mlul and EcoRI. Similarly, the expression 
vector pVEX115f+T was digested with Kpnl and EcoRI. All DNA 
fragments were purified from low-melting agarose gels, 

10 three -fragment ligations were carried out, and the products 
were transformed into E. coli DH5a. The three constructs 
described in this example have 254, 198 and 79 amino acids of 
LF joined with PE domains lb and III, These fusion proteins 
are designated LF 1 " 254 .TR.PE 362 ' 613 (SEQ ID NO:10), 

15 LF 1 " 198 .TR,PE 362 " 613 , and LF 1 ' 79 .TR.PE 362 " 613 , respectively. The 
proteins retain the native carboxyl- terminal sequence of PE, 
REDLK . It should be noted that these abbreviations do not 
specify the entire amino acid content of the proteins, because 
all the constructs also contain Met-Val-Pro, which was added 

20 to the amino -terminus of the LF domain by the PCR 
manipulations . 

Expression and Purification of Deleted LF and Fusion Proteins 

Recombinant plasmids were transformed into E. coli 
SA2821 (provided by Sankar Adhya, NCI, NIH), a derivative of 

25 BL2KXDE3) (Studier and Moffatt, 1986) that lacks the 

proteases encoded by the Ion, OmpT, and degP genes, and has 
the T7 RNA polymerase gene under control of the lac promoter 
(Strauch et . , 1989) Transf ormants were grown in super 
broth with 100 jig /mi ampicillin , with shaking at 225 rpm, 

30 37°C, in 2-L cultures. When A 600 reached 0.8-1.0, isopropyi 
1 - thio-£-D-galactopyranoside was added to a final 
concentration of 1 irM and cultures were incubated for an 
additional 2 h. EDTA and 1 , 10-o-phenanthroline were added to 
5 and 0.1 mM, respectively, and periplasmic protein was 

35 extracted as described in Example 1. The supernatant fluids 

were concentrated by Centriprep- 30 units (Arnicon) and proteins 
were purified to near homogeneity by gel filtration 
(Sephacryl S-200, Pharmacia - LKB ) and anion exchange 
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chromatography (MonoQ, Pharmacia- LKB) as described in Example 
1. To determine the percentage of full length protein, SDS 
gels stained with Coomassie Brilliant Blue were scanned with a 
laser densitometer (Pharmacia -LKB Ultrascan XL) . Western 
5 blots were performed as described previously (Singh et al. f 
1991) . 

The LF proteins having terminal deletions and the LF- 
PE fusion proteins were obtained from periplasmic extracts and 
purified to near homogeneity by gel filtration and anion 

10 exchange chromatography. The migration of the proteins was 

consistent with their expected molecular weights. Immunoblots 
confirmed that the LF proteins had reactivity with LF 
antisera, and the LF-PE fusion proteins had reactivity with 
both LF and PE antisera. Fusion proteins and terminally- 

15 deleted LF proteins differed in their susceptibility to 

proteolysis as judged by the appearance of peptide fragments 
on the immunoblots, and this was also reflected in the 
different amounts of purified proteins obtained* Thus, from 
2-L cultures the yields of purified proteins were LF 41 " 776 , 

20 39. M9; LF 79 " 776 , 32 fig; LF 1 " 729 , 50 fig; LF 1 ' 693 , 46 fig; 

LP I-2S4 -TR-PE 362-613 f 184 ; LF 1 " 198 . TR. PE 362 " 6l3 , 80 fig; 

LF 1_79 .TR.PE 362 " 613 / 127 fig. 

LF proteins deleted in the repeat region were found to 
be unstable and full size product could not be purified. 

25 Therefore, the activities of these proteins were determined by 
assay of crude periplasmic extracts, and immunoblots were used 
to estimate the amount of the full size proteins present. 
Cytotoxicity on Macrophages of LF Proteins Having Terminal and 
Internal Deletions 

30 Deleted LF proteins were assayed for LF functional 

activity on the J774A.1 macrophage cell line in the presence 
of nar.ive PA as described in Example l. Briefly, cells were 
plated in 24- or 48 -well dishes in Dulbecco's modified Eagle 
medium (DMEM) containing 10% fetal bovine serum, and allowed 

35 to grow for 18 h. PA (1 fig /ml) and the mutant LF proteins 

were added and cells were incubated for 3 h. To measure the 
viability of the treated cells, 3- [4, 5-dimethylthiazol-2 -yl] 
2,5-diphenyltetrazolium bromide (MTT) was added to the cells 
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to a final concentration of 0.5 mg/ml . After incubating for 
45 min, the medium was aspirated and cells were dissolved in 
90% isopropanol, 0.5% SDS, 40 iriM HC1, and read at 540 nm using 
a UVmax Kinetic Microplate Reader (Molecular Devices Corp.). 
5 To determine the extent of essential sequences at the 

amino terminus of LF, the toxicities of the two LF proteins 
deleted at the amino- terminus were measured in combination 
with PA in the macrophage lysis assay. Purified LF 41 * 776 and 
LF 79 " 776 were unable to lyse J774A.1 macrophage cells. This 

10 indicates that some portion of the sequence preceding residue 
41 is needed to maintain an active LF protein. 

To examine the role of the carboxyl terminus of LF # 
two proteins truncated in this region were prepared and 
analyzed. The proteins LF 1 " 693 and LF 1 " 729 were assayed on 

15 J774A.1 cells and found to be inactive. This is presumed to 
be due to inactivation of the putative catalytic domain. 

To begin study of the role of the repeat region of LF, 
four constructs were made having deletions in this region. 
The proteins expressed from these mutants were unstable. Of 

20 the four deleted proteins, only LF 1 * 307 .TR.LF 327 * 776 had 

immunoreactive material at the position expected of intact 
fusion protein. The amount of intact LF 1 " 307 .TR.LF 327 * 776 was 
similar to that of native LF expressed in the same vector. 
When these unpurified periplasmic extracts were tested in 

25 J774A.1 macrophages, only the native LF control was toxic. 

LF 1 " 307 . TR.LF 327 " 776 did not lyse macrophages even when present 
at 50 -fold higher concentration than that of crude periplasmic 
protein of LF. Conclusions cannot be drawn about the 
toxicities of the other three constructs because full size 

30 fusion proteins were not present in the periplasmic extracts. 

Cell Culture Techniques and Protein Synthesis Inhibition Assay 
of Fusion Proteins 

CHO cells were maintained as monolayers in a-modified 
minimum essential medium (a-MEM) supplemented with 5% fetal 

35 bovine serum, 10 mM HEPES (pH 7.3), and 

penicillin/streptomycin. Protein synthesis assays were 
carried out in 24- or 48 -well dishes as described in Example 
1. CHO cells were incubated with PA (G.l ug/ml) and varying 
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concentrations of LF, which is expected to block the receptor. 
Fusion proteins were added at fixed concentrations, as 
follows: FP4 , 100 ng/ml ; FP23, 100 ng/ml, and FP33 f 5 ng/ml. 
Cells were incubated for 20 hr and protein synthesis 
5 inhibition was evaluated by [ 3 H] leucine incorporation. 
Cytotoxicity of the LF-PE Fusion Proteins on CHO Cells 

The use of fusion proteins provides a more defined 
method for measuring the translocation of LF f as demonstrated 
in Example 1 showing that fusions of LF with domains lb and 

10 III of PE are highly toxicy. Translocation of these fusions 
is conveniently measured because domain III blocks protein 
synthesis by ADP-ribosylation of elongation factor 2. The new 
fusions containing varying portions of LF fused to PE domains 
lb and III were designed to identify the minimum LF sequence 

15 able to promote translocation. The EC 50 of LF 1 " 254 .TR.PE 362 " 613 
(SEQ ID NO: 10) was 1-7 ng/ml, whereas LF 1 " 198 .TR. PE 362 " 613 and 
Lpi-79 .TR. pe 362 " 613 did not kill 50% of the cells even at a 
1200- fold higher concentration. Other constructs were also 
made and analyzed, containing larger portions of LF fused to 

20 PE domains lb and III, and found those to be equal in potency 
to LF 1 " 254 .TR.PE 362 ' 613 . These results show that residues 1-254 
contain all the sequences essential for binding to PA63 . The 
fusion proteins had no toxicity in the absence of PA, proving 
that their internalization absolutely requires interaction 

25 with PA. 

Binding of Fusion Proteins and Deleted LF Proteins to PA 

Binding of LF proteins to cell bound PA was determined 
by competition with radiolabeled 125 I-LF. Native LF was 
radiolabeled (3.1 x 10 6 cpm//xg protein) using the 

30 Bolton -Hunter reagent- Binding studies employed the L6 rau 
myoblast cell line, which has approximately twice as many 
receptors as the J774A.1 macrophage line (Singh et al 1989) . 
For convenience, cells were chemically fixed by a gentle 
procedure that preserves the binding activity of the receptor 

35 as well as the ability of the cell -surface protease to cleave 
PA to produce receptor-bound PA63. Assays were carried out 
in 24 -well dishes using cells plated in DMEM with 10% fetal 
bovine serum one day before the experiment . Cell monolayers 
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were washed twice with Hanks 1 balanced salt solution (HBSS) 
containing 25 mM HEPES and were chemically fixed for 30 min at 
23° in 10 mM N-hydroxysuccinimide and 30 mM l-ethyl-3- [3- 
dimethyl [aminopropyl] carbodiimide, in buffer containing 
5 10 mM HEPES , 140 mM NaCl, 1 mM CaCl 2 , and 1 mM MgCl 2 . 

Monolayers were washed with HBSS containing 25 mM HEPES and 
the fixative was inactivated by incubating 30 min at 23° in 
DMEM (without serum) containing 25 mM HEPES. Native PA was 
added at 1 fig/ml in minimum essential medium containing Hanks 1 

10 salts, 25 mM HEPES , 1% bovine serum albumin, and a total of 
4.5 mM NaHC0 3 . Cells were incubated overnight at room 
temperature to allow binding and cleavage of PA, Cells were 
washed twice in HBSS and mutant LF proteins (0-5000 ng/ml) 
along with 50 ng/ml 125 I-LF was added to each well. Cells 

15 were further incubated for 5 h, washed three times in HBSS , 
dissolved in 0.5 ml 1 N NaOH, and counted in a gamma counter 
(Beckman Gamma 9000) . 

Using this assay, the LF mutant proteins having amino - 
terminal deletions were found incapable of binding to PA, 

20 thereby explaining their lack of toxicity. Carboxyl- terminal 
deleted LF proteins did bind to PA in a dose dependent manner, 
although they had slightly lower affinity than LF. The 
proteins deleted in the repeat region could not be tested for 
competitive binding because their instability prevented 

25 purification of intact protein. 

The EC 50 for LF 1 ' 254 .TR. PE 362 " 613 binding was found to 
be 220 ng/ml, which is similar to that of LF, 300 ng/ml. 
Therefore the binding data correlate well with the toxicity of 
this construct. In contrast, neither LF 1 " 198 . TR . PE 362 613 nor 

30 LF 1 ' 79 .TR. PE 362 ~ G13 bound to PA63 on cells, thereby explaining 
their lack of toxicity. 

EXAMPLE 3: Construction of Genes Encoding PA Fusion Proteins 
The genes encoding PA (or PA truncated at the carboxyl 
35 terminus to abrogate binding to the PA receptor) and an 

alternative targeting moiety (a single-chain antibody, growth 
factor, or other cell type- specif ic domain) are spliced using 
conventional molecular biological techniques. The PA gene is 
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readily available, and the genes encoding alternative 
targeting domains are derived as described below. 
Single- chain antibodies (sFv) 
See Example 4, below. 
5 Growth factors and other targeti ng proteins 

The nucleotide sequences of genes encoding a number of 
growth factors and other proteins that are targeted to 
specific cell types or classes are reported in freely 
accessible databases (e.g., GenBank) , and in many cases the 

10 genes are available. In circumstances where this is not the 
case, genes can be cloned from genomic or cDNA libraries, 
using probes based on the known nucleotide sequence of the 
gene that codes for the growth factor, or derived from a 
partial amino acid sequence of the protein (see, e.g. 

15 Sambrook, supra J . Alternatively, genes encoding the growth 
factor or other targeting moiety can be produced de novo from 
chemically synthesized overlapping oligonucleotides, using the 
preferred codon usage of the expression host. For example, 
the gene for human epidermal growth factor urogastrone was 

20 synthesized from the known amino acid sequence of human 

urogastrone using yeast preferred codoiis. The cloned DNA, 
under control of the yeast GAPDH promoter and yeast ADH-1 
terminator, expresses a product having the same properties as 
natural human urogastrone. The product of this synthesized 

25 gene is nearly identical to that of the natural urogastrone, 
the only difference being that the product of the synthetic 
gene has a trptophan at amino acid 13, while the other has a 
tyrosine (Urdea et al . Proc . Natl. Acad. Sri. USA 80:7461- 
7465, 1983) . 

30 Expression of PA Fusion proteins 

Once constructed, genes encoding PA- fusion proteins 
are expressed in Bacillus anthracis . and recombinant proteins 
are purified by one of the following methods: (i) size -based 
chromatographic separation; (ii) affinity chromatography. In 

35 the case of PA-sFv fusions, immobilized metal chelate affinity 
chromatography may be the purification method of choice, 
because addition of a string of six histidine residues at che 
carboxyl terminus of the sFv will have no detrimental effect 
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on binding to antigen. Additional methods of expression of 
PA- fusion proteins utilize an in vitro rabbit reticulocyte 
lysate-based coupled transcription/translation system, which 
has been demonstrated to accurately refold chimeric proteins 
5 consisting of an sFv fused to diphtheria toxin, or Pseudomonas 
exotoxin A as demonstrated in Example 4. 
Functional testing of PA F usion proteins 

After expression and purification, functionality of 
PA- fusion proteins are tested by determining their ability to 

10 act in concert with an LF-PE fusion protein to inhibit protein 
synthesis in an appropriate cell line. Using a PA-anti human 
transferrin receptor sFv fusion as a model, the following 
properties are examined: (i) Cell type-specificity (protein 
synthesis should be inhibited in cell lines which express the 

15 human transferrin receptor, but not in those which do not) ; 

(ii) Independence of toxicity from PA receptor binding (excess 
free PA should have no effect on toxicity of the PA-sFv/LF-PE 
complex); (iii) Competitive inhibition by excess free antibody 
(toxicity should be abrogated in the presence of excess sFv, 

20 or the monoclonal antibody from which it was derived) . For 
example such tests are described in Examples 4 and 5. These 
studies and other studies are used to confirm that PA has been 
successfully re-routed to an alternative receptor to permit 
the use of the present anthrax toxin-based cell type-specific 

25 cytotoxic agents for the treatment of disease. 

EXAMPLE 4: Generating Fusion Proteins with Single-chain 
Antibodies Reagent s 

Methionine- free rabbit reticulocyte lysate-based 

30 coupled transcript ion/ translation reagents, recombinant: 
ribonuclease inhibitor (rRNasin) , and cartridges for the 
purification of plasmid DNA were purchased from Promega 
(Madison, WI) . Tissue culture supplies were from GIBCO (Grand 
Island, NY) and Biofluids (Rockville, MD) . 0KT9 monoclonal 

35 antibody was purchased from Ortho Diagnostic Systems (Raritan, 
NJ) > PCR reagents were obtained £rom by Perkin- Elmer Cetus 
Instruments (Norwalk, CT) , and restriction and nucleic acid 
modifying enzymes (including M-MLV reverse transcriptase) were 
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from GIBCO-BRL (Gaithersburg, MD) . A Geneclean kit for the 
recovery of DNA from agarose gels was supplied by BIO 101 (La 
Jolla, CA) . Hybridoma mRNA was isolated using a Fast Trak 
mRNA isolation kit (Invitrogen, San Diego, CA) . All isotopes 
5 were purchased from Du Pont-New England Nuclear (Boston, MA) , 
except [Adenylate- 32 P] NAD, which was supplied by ICN 
Biomedicals (Costa Mesa, CA) ♦ Pseudomonas exotoxin A was 
obtained from List Biologicals (Campbell, CA) . 
Oligonucleotides were synthesized on a dual column Milligen- 

10 Biosearch Cyclone Plus DNA synthesizer (Burlington, MA) , and 
purified using OPC cartridges (Applied Biosys terns, Foster 
City, CA) . DNA templates were sequenced using a Seguenase II 
kit (United States Biochemical Corp., Cleveland, OH) , and SDS- 
polyacrylamide gel electrophoresis (PAGE) was performed using 

15 10-20% gradient gels (Daiichi, Tokyo, Japan) . After 

electrophoresis, gels were fixed in 10% methanol/7% acetic 
acid, and soaked in autoradiography enhancer (Amplify, 
Amersham Arlington Heights, IL) . After drying, 
autoradiography was performed overnight using X-OMAT AR2 film 

20 (Eastman Kodak, Rochester, NY) . 
Plasmids 

The vector pET-lld is available from Novagen, Inc., 
Madison, WI. Plasmids were maintained and propagated in E. 
coli strain XLl-Blue (Stratagene, La Jolla, CA) . 

25 Cell Lines 

K562, a human erythroleukemia- derived cell line [ATCC 
CCL 243] known to express high levels of the human transferrin 
receptor at the cell surface, was cultured in RPMT 1640 medium 
containing 24 mM NaHC0 3 , 10% fetal calf serum, 2 mM glutamine, 

30 1 mM sodium pyruvate, 0 . 1 mM nonessential amino acids, and 10 
/xg/ml gentamycin. An African green monkey kidney line, Vero 
(ATCC CCL 31). was grown in Dulbecco's modi fried Eagle's medium 
(DMEM) supplemented as indicated above. The OKT9 hybridoma 
(ATCC CRL 8021) , which produces a MoAb (IgC^) reactive to the 

35 human transferrin receptor, was maintained in Iscove's 

modified Dulbecco's medium containing 20% fetal calf serum, in 
addition to the supplements described above. All cell lines 
were cultured at 37°C in a 5% C0 2 humidified atmosphere. 
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Construction of sFv from Hv bridomas 

Antibody V L and V H genes were cloned using a 
modification of a previously described technique (Larrick et 
al. Biotechniqaes 7:360, 1989; Orlandi et al. Proc. Natl. 
5 Acad. Sci. USA 86:3833, 1989; Chaudhary et al., 1990). 

Briefly, mRNA was isolated from l x 10 s antibody producing 
hybridoma cells, and approximately 3 fig was reverse 
transcribed with M-MLV reverse transcriptase, using random 
hexanucleotides as primers. The resulting cDNA was screened 

10 with two sets of PCR primer pairs designed to ascertain from 
which Rabat gene family the heavy and light chains were 
derived (Rabat et al. Sequences of proteins of immunological 
interest. Fifth Edition. (Bethesda, Maryland: U.S. Public 
Health Service, 1991) . Having identified the most effective 

15 primer pairs, cDNA's encoding V L and V H were spliced, 

separated by a region encoding a 15 amino acid peptide linker, 
using a previously described PCR technique known as gene 
splicing by overlap extension (SOE) (Johnson & Bird Methods 
Enzymol. 203:88, 1991). The sFv gene was then cloned into 

20 pET-lld, in frame and on the 5'-side of the PE40 gene, such 

that expression of the construct should generate an sFv-PE40 
fusion protein approximately 70 kDa in size. 
Design of primers for PCR amplification of V region genes 
The first and third complementarity determining 

25 regions (CDRs) of terminally rearranged immunoglobulin 

variable region genes are flanked by conserved sequences (the 
first framework region, FRl on the 5' side of CDR1, and the 
fourth framework region, FR4 , on the 3 T side of CDR3) . 

Although murine variable region genes have been 

30 successfully cloned, regardless of family, with just two pairs 
of highly degenerate primers (one pair for v L and another for 
V H ) 'Gussow et al . Cold Spring Harbor Symp . Quant Biol. 
54: 265 , 1989; Criandi et al . , 1989; Chaudhary et al., 1990; 
Batra et al., 1991), the method may not be effective in cases 

35 where the number of mismatches between primers and the target 
sequence is extensive. With this in mind, using the Kabat 
database of murine V gene sequences the present invention 
provides a set of ten FRl -derived primers (six for V L and four 
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for V H ) , such that any of the database sequences selected at 
random would have a maximum of three mismatches with the most 
homologous primer. This set of primers can be used 
effectively to clone V region genes from a number of MoAb 
5 secreting cell lines. 

Assembly of the 0KT9 sFv gene 

mENA isolated from the hybridoma secreting the 0KT9 
MoAb was converted to cDNA as described previously (Larrick et 
al., 1989; Orlandi et al., 1989; Chaudhary et al., 1990). 

10 Despite the fact that CL-UNI is the partnering oligonucleotide 
in each case, a product the required size (approximately 400 
bp) is not produced by V L primers IV/VI, Ila or lib. This 
suggests that mismatches between these primers and the target 
sequence were too extensive to allow efficient amplification. 

15 A similar argument can be used to explain the failure of V H 
primers I and III to produce the required product. It is 
clear that primers V L -I/III and V H -V are most effective at 
amplifying the 0KT9 V L and V H genes respectively. PCR 
amplified OKT9 V L and V R genes were spliced together using the 

20 SOE technique, as previously described (Johnson & Bird, 1991) . 
A synthetic DNA sequence encoding a 15 amino acid linker, was 
inserted between the variable regions; this linker has been 
used very effectively in the production of functional sFv 
(Huston et al . , 1991; Johnson & Bird, 1991), and appears to 

25 allow the variable chains to assume the optimum orientation 

for antigen binding. Following splicing of V region genes by 
the SOE procedure, the DNA fragment encoding the 0KT9 sFv was 
elect rophoresed through a 1.5% agarose gel, purified by the 
Geneclean technique, digested with the appropriate pair of 

30 restriction enzymes, and cloned into the pET-ild expression 
vector in frame and on the 5' side of the PE40 gene. 
In vitro expression of sFv-PE40 fusion proteins 

Plasmid templates were transcribed and translated 
using a rabbit reticulocyte lysate -based transcription/ 

3 5 translation system, according to the instructions of the 
manufacturer, in 96 -well microtiter plate format L- 
[ 35 S] methionine -labeled proteins (for analysis by SDS-PAGE) 
and unlabeled proteins (for enzymatic analysis and bioassay) , 
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were produced in similar conditions, except that the isotope 
was replaced with 20 unlabeled L-methionine in the latter 
case. Control lysate was produced by adding all reagents 
except plasmid DNA. After translation, unlabeled samples were 
5 dialysed overnight at 4°C against phosphate-buffered saline 

(PBS) , pH 7,4 in Spectra/Por 6 MWCO (molecular weight cutoff) 
50,000 tubing (Spectrum, Houston, TX) . 

Constructs incorporating the aberrant kappa transcript 
will contain a translation termination codon in the V L chain 

10 as previously described, and would therefore be expected to 
generate a translation product approximately 12 JcDa in size. 
On the other hand, constructs which have incorporated the 
productive V L gene contain no such termination codon, and a 
full-length fusion protein (approximately 70 kDa in size) 

15 should be produced. 

2n vitro expression studies were used to determine the 
size of the protein encoded by the OKT9 sFv-PE40 gene. The 
constructs tested in this experiment clearly produce a protein 
of approximately 70 kDa, indicating that the clones do not 

20 contain the aberrant V L gene, and are devoid of frameshift 
mutations. Of several OKT9 sFv constructs tested, none 
apparently incorporated the incorrect VL gene. However, in 
the case of another sFv generated by this method (1B7 sFv, 
derived from a MoAb which binds to pertussis toxin) , the 

25 majority of the clones tested produced a 12 kDa protein, and 
were found to contain the aberrant transcript on DNA 
sequencing. It should be noted that the 12kDa fragment is 
frequently obscured in 10-20% gradient gels by unincorporated 
35 S -methionine which co-migrates with the dye front. 

30 Determination of Protein Concentration 

The enzymatic activities of fusion proteins were 
compared with those of known concentrations of PE in an ADP-- 
ribosyl transferase assay, allowing molarities to be 
determined (Johnson et al • J. Biol, Chem. 263 : 1295-1399 , 

35 1988) . Samples were adjusted to contain equivalent 

concentrations of lysate, thus maintaining an identical amount 
of substrate (elongation factor 2) in all cases- 
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Protein Synthesis Inhibition Assay for Functional sFv-PE40 
Binding 

Binding of the 0KT9 sFv to the human transferrin 
receptor was qualitatively determined by assessing the ability 
5 of the 0KT9 sFv-PE40 fusion protein to inhibit protein 

synthesis in the K562 cell line. Pseudomonas exotoxin A is a 
bacterial protein which is capable of inhibiting de novo 
protein synthesis in a variety of eukaryotic cell types. The 
toxin binds to the cell surface, and ultimately translocates 

10 to the cytosol where it enzymatically inactivates elongation 
factor 2. PE40 is a mutant form of exotoxin A which lacks a 
binding domain, but is enzymatically active, and capable of 
translocation. Fusion proteins containing PE40 and an 
alternative binding domain (for example, an sFv to a cell 

15 surface receptor) will inhibit protein synthesis in an 

appropriate cell line only if the sFv binds to a cell -surface 
antigen which subsequently internalizes into an acidified 
endosome (Chaudhary et al., 1989). The TfnR is such an 
antigen, so a qualitative assessment of binding may be 

20 determined by measuring the ability of the 0KT9 sFv-PE40 

fusion protein to inhibit protein synthesis in a cell line 
like K562, which expresses the TfnR. Protein synthesis 
inhibition assays were performed as described previously 
(Johnson et al., 1988). Briefly, samples were serially 

25 diluted in ice cold PBS, 0.2% BSA, and llfil volumes were added 
to the appropriate well of a 96 -well microtiter plate 
(containing 10 4 cells/100/xl/well in leucine-free RPMI 1640) . 
After carefully mixing the contents of each well, the pla-= 
was incubated for the indicated time at 37°C in a 5% C0 2 

30 humidified atmosphere. Each well was then pulsed with 20 -xl of 
L- [ 14 C (U) ] leucine (0.1 /iCi/20/il) , incubated for 1 hour, and 
harvested onto glass fiber filters using a PHD cell harvest e - 
(Cambridge Technology, Cambridge, MA) . Results are expressed 
as a percentage of the isotope incorporation in cells treated 

35 with appropriate concentrations of control dialyzed lysate. 

The results of this assay, clearly indicate that 0KT9 
sFv- PE40 is capable of inhibiting protein synthesis with an 
IC 50 (the concentration of a reagent which inhibits protein 
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synthesis by 50%) of approximately 2 x 10" 9 M- The toxicity 
of the fusion protein, but not of PE, was abrogated in the 
presence of excess 0KT9 MoAb (12 /xg/ml) , indicating that 
binding is specific for the TfnR. No toxicity was observed 
5 when K562 was substituted with Vero (an African Green monkey 
cell line which expresses the simian version of the 
transferrin receptor) , indicating that the 0KT9 sFv retains 
the human receptor-specific antigen binding properties of the 
parent antibody* 

10 Having demonstrated binding of the OKT9 sFv to TfnR, 

its nucleotide sequence was determined using dideoxynucleotide 
chain- terminating methods, confirming extensive homology with 
the respective regions of immunoglobulins of known sequence. 

15 EXAMPLE 5: Characterization of single -chain antibody (sFv) - 

toxin fusion proteins produced in vitro i n rabbit reticulocyte 
lysate 

The present invention provides in vitro production of 
proteins containing a toxin domain (derived from Diphtheria 

20 toxin (DT) or PE) fused to a domain encoding a single- chain 
antibody directed against the human transferrin receptor 
(TfnR). The expression of this antigen on the cell surface is 
coordinately regulated with cell growth; TfnR exhibits a 
limited pattern of expression in normal tissue, but is widely 

25 distributed on carcinomas and sarcomas (Gatter, et al. J. 
Clin. Pathol. 36:539-545, 1983), and may therefore be a 
suitable target for immunotoxin- based therapeutic strategies 
(Johnson, V. G . and Youle, R. J. "Intracellular Trafficking of 
Proteins" Cambridge Univ. Press, Cambridge England, Steer and 

30 Hover eds . , pp. 183-225; Batra et al . ... 1991; Johnson et all 
1988). 

Proteins consisting of a fusion between an sFv 
directed against the TfnR. and either the carboxyl -terxinus 40 
kDa of PE, or the DT mutant CRM 107 [S(525)F] were expressed 
35 in rabbit reticulocyte lysates, and found to be specifically 
cytotoxic to K562, a cell line known to express TfnR. In 
comparison, a chimeric protein consisting of a fusion between 
a second DT mutant, DTM1 [S{508)F, S(525)F] and the E6 sFv 
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exhibited significantly lower cytotoxicity. Legal 
restrictions imposed on manipulating toxin genes in vivo 
previously prevented expression of potentially interesting 
toxin- containing fusion proteins (Federal Register 
5 51(88) (III) :169S1 and Appendix P: 16971); the present invention 
provides a novel procedure for in vitro gene construction and 
expression which satisfies the regulatory requirements, 
facilitating the first study of the potential of non- truncated 
DT mutants in fusion protein ITs. The present data also 
10 demonstrates that functional recombinant antibodies can be 
generated in vitro. 
Reagents 

DT and PE were purchased from List Biologicals 
(Campbell, CA) . Nuclease treated, methionine -free rabbit 

15 reticulocyte lysate and recombinant ribonuclease inhibitor 
(rRNasin) were obtained from Promega (Madison, WL) . Tissue 
culture supplies were from GIBC0 (Grand Island, NY) and 
Biofluids (Rockville, MD) . Reagents for PCR were provided by 
Perkin- Elmer Cetus (Norwalk, CT) . Restriction and nucleic 

20 acid modifying enzymes were from Stratagene (La Jolla, CA) , as 
was the mCAP kit used to produce capped mRNA in vitro. 
Geneclean and RNaid kits (for the purification of DNA and RNA 
respectively) were supplied by BIO 101 (La Jolla, CA) . L- 
[ 35 S] methionine, L- [ 14 C (U) ] leucine and 5 • - (alpha- thio) - 

25 [ 35 S]dATP were from New England Nuclear (Boston, MA) . 

[Adenylate - 32 P] NAD was supplied by ICN Biomedicals (Costa 
Mesa, CA) . 

Oligonucleotide Synthesis 

Oligonucleotides were synthesized {Q.2fiM scale), using 
30 cyanoethylphosphoramidites supplied by Milligen-Biosearch 
(Burlington, MA) on a dual column Cyclone Plus DNA 
synthesizer. Post -synthesis purification was achieved using 
0PC cartridges (Applied Biosystems, Foster City, CA) . 
Plasmids 

35 pET-lld was the generous gift of Dr. F. William 

Studier, Brookhaven National Laboratory (Upton, NY) . pHB21- 
PE40, a derivative of pET-lld containing the gene for PE40, 
was kindly supplied by Dr. David FitzGerald (NIH, Bethesda, 
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MD) . All plasmids were maintained and propagated in E. coli 
strain XLl-Blue (Stratagene, La Jolla, CA) . 
Cell Lines 

Corynebacterium diphtheriae strain C7 6 (/?) tox+ (ATCC 
5 27012) was obtained from the ATCC (Rockville, MD) , and the 

strain producing the binding -deficient DT mutant CRM 103 was 
the generous gift of Dr. Neil Groman, University of Washington 
(Seattle, WA) . Both strains were propagated in LB broth. 
K562 (a human erythroleukemia- derived cell line, ATCC CCL 243) 

10 was cultured in RPMI 1640 medium containing 24 mM NaHC0 3 , 10% 
fetal calf serum, 2 mM glutamine, 1 mM sodium pyruvate, 0.1 mM 
nonessential amino acids, and 10 /xg/ml gentamycin. Vero (an 
African green monkey kidney line, ATCC CCL 81) was grown in 
Dulbecco's modified Eagle's medium supplemented as described 

15 above. All eukaryotic cells were cultured at 37°C in a 5% C0 2 
humidified atmosphere. 
Splicing Genes using PCR 

Genes encoding antibody V L and V H were spliced, 
separated by a region encoding a 15 amino acid peptide linker, 

20 using a previously described PCR technique known as gene 

splicing by overlap extension (SOE) (Horton et al . Gene 77:61- 
68, 1989; Horton et al. Biotechniques 8:528-535, 1990). For 
studies requiring in vitro expression of PCR products, tox 
gene -derived fragments were linked to those encoding sFv using 

25 a similar method, without the use of restriction enzymes. 

Construction of Plasmids Encoding Toxin-sFv Fusion Proteins 
The gene encoding PE4 0 was obtained as an insert in 
pET-lld, and the sFv gene was cloned on the 5/ side of this 
insert as indicated. To clone the gene encoding the DT 

30 binding-site mutant DTMl [S(508)F, S(S25)F], genomic DNA was 
isolated from the C. diphtheriae strain which produces CRM 
103. DNA was extracted by a modification of the 
cetyl trimethylammonium bromide extraction procedure (Wilson, 
K. "Current Protocols in Molecular Biology" Asubel et al . eds. 

35 John Wiley & Sons New York, 2.4.1 - 2.4.5, 1988) and subjected 
to 20 cycles of PCR amplification. Primers were designed to: 
(i) amplify the 1605 bp region encoding CRM 103, concomitantly 
mutating the codon at position 525 from TCT to TTT, and (ii) 
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incorporate restriction sites appropriate for cloning. The 
mutations present in CRM 107 and CRM 103 were thus combined on 
a single gene. 

In Vitro Transcription of DNA Templates 
5 For transcription, DNA templates required a T7 RNA 

polymerase promoter immediately upstream of the gene of 
interest (Oakley, J. L, and Coleman, J. E* Proc. Acad. Sci. 
U.S.A. 74:4266-4270, 1977). Such a promoter was conveniently 
present in pET-lld (Studier et al . Enzymol 185:60-89, 1990), 

10 In the case of PCR products, the upstream primer (a 57-mer, 

T7-DT) was used to introduce all of the elements necessary for 
in vitro transcription/translation. T7-DT includes a 
consensus T7 RNA polymerase promoter, together with the first 
seven codons of mature DT (Greenfield et al. Proc. Natl. Acad. 

15 Sci. U.S.A. 80:6853-6857, 1983) immediately preceded by an ATG 
translation initiation codon in the optimum Kozak context 
(Kozak, M, J. Biol. Chem. 266:19867-19870, 1991). 
m 7 G(5' )ppp(5 ' )G- capped RNA was produced by transcription from 
linearized plasmids or PCR products using an mCAP kit, 

20 according to the manufacturer 1 s protocol. Prior to 

translation, RNA was purified using an RNaid kit, recovered in 
nuclease free water, and analyzed by formaldehyde gel 
electrophoresis . 

In Vitro Expression of Fusion Proteins 

25 L- [ 35 S] methionine-labelled proteins (for analysis by 

SDS-PAGE) were produced from capped RNA in methionine- free , 
nuclease treated rabbit reticulocyte lysate f according to the 
suppliers instructions. Unlabeled proteins (for bioassay) , 
were produced in similar conditions, except that the isotope 

30 was replaced with 20 /zM unlabeled L-methionine . Control 

lysate was produced by adding all reagents except exogenous 
RNA. After trans ".at ion, sample? were dialysec overnight at 
4°C against PBS, pH 7.4 in Spectra/Por 6 MWCO 50,000 tubing 
(Spectrum, Houston, TX) . 

35 Prior to transcription, plasmids were linearized at. 

the Bglll site and treated with proteinase K to destroy 
ribonucleases tha~ may contaminate the sample. .^fter 
phenol/chloroform extraction and ethanol precipitation, DNA 
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was dissolved in nuclease free water to a concentration of 
approximately 0.2 /xg/>l. m 7 G (5 1 )ppp (5 1 ) G-capped RNA was 
synthesized by T7 RNA polymerase using the conditions 
recommended by the manufacturer, and its integrity was 
5 confirmed by formaldehyde gel electrophoresis. Capped RNA was 
translated in a commercially available rabbit reticulocyte 
lysate, according to the instructions of the manufacturer. It 
is clear from the gel that the major band in each case has a 
molecular weight corresponding to that of the protein of 

10 interest, and that relatively large molecules (approximately 

120 kDa in the case of DTM1-E6 sFv-PE40) can be synthesized in 
the lysate using the conditions described. 

Immediately following translation, samples were 
extensively dialyzed overnight at 4°C against PBS , pH 7.4. 

15 The dialysis step was found to be essential, because non- 
dialyzed rabbit reticulocyte lysate resulted in the 
incorporation of significantly lower amounts of 14 C- leucine 
upon assay by protein synthesis inhibition in all cell lines 
tested. After determining the concentration of the newly 

20 synthesized protein using a standard assay for measuring ADP- 
ribosyltransf erase activity (Johnson et al., 1988), the 
cytotoxic activity of samples was immediately determined. 
ADP-ribosyl Transferase Assay 

The enzymatic activity (and therefore molarity) of 

25 fusion proteins was determined by comparison with DT or PE 
standard curves, as described previously (Johnson et al . , 
19 88) . Appropriate volumes of control lysate were added to 
each standard curve sample, in order to control for the 
presence of significant levels of EF-2 in reticulocyte lysate. 

3 0 Other Methods 

SDS - PAGE was performed as previously described 
(Laemmli, r J. K. Nature 227:680-685, 1970), using 10-20% 
gradient gels (Daiichi, Tokyo, Japan) . Once electrophoresis 
was complete, gels were fixed for 15 minutes in 10% methanol, 

35 7% acetic acid, and then soaked for 30 minutes in 

autoradiography enhancer (Amplify, Amersham Arlington Heights, 
ID . After drying, autoradiography was performed overnight 
using X-OMAT AR2 film (Eastman Kodak, Rochester, NY) , in the 
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absence of intensifying screens. Dideoxynucleotide chain- 
termination sequencing of double -stranded DNA templates was 
performed using a Sequenase II kit (United States Biochemical 
Corp., Cleveland, OH), according to the manufacturer's 
5 protocol. 

Cytotoxicity of Toxin- sFv Fusion Protein s Expressed in 
Reticulocyte Lvsates 

The cytotoxic activity of fusion proteins was 
determined by their ability to inhibit protein synthesis in 

10 relevant cell lines (e.g., K562) . Assays were performed as 

described previously (Johnson et al., 1988). Briefly, samples 
were serially diluted in ice cold PBS, 0.2% BSA, and ll/zl 
volumes were added to the appropriate well of a 96 -well 
microtiter plate (containing 10 4 cells/well in leucine- free 

15 RPMI 1640) . After carefully mixing the contents of each well, 
the plate was incubated for the indicated time at 37°C in a 5% 
C0 2 humidified atmosphere. Each well was then pulsed with 
2 0ftl of L- t 14 C (U) ] leucine (0.1 jzCi/20/il) , incubated for 1 
hour, and harvested onto glass fiber filters using a PHD cell 

20 harvester (Cambridge Technology, Cambridge, MA) . Results were 
expressed as a percentage of the isotope incorporation in 
cells treated with appropriate concentrations of control 
dialyzed lysate. 

The results of the protein synthesis inhibition assay 

25 clearly indicate that PE40 -containing fusion proteins 

synthesized in cell -free reticulocyte lysates are highly 
cytotoxic to this cell line (IC 50 1 x 10" 10 M) . In contrast, 
DTM1-ES sFv was at least ten -fold less toxic to K562 than the 
PE40- containing fusion proteip, despite the fact that it 

30 exhibited ADP-ribosyl transferase activity indistinguishable 
from that of wt DT synthesized from an equivalent amount of 
K.NA in an identical reticulocyte lysate mix. Since the 
decreased toxicity of DTM1-E6 sFv is clearly not due to a 
deficit in enzymatic activity, the binding and/or 

35 translocation process is implicated. Possible mechanisms by 
which the sFv- antigen interaction could be inhibited include: 
(i) misfolding of the sFv domain or (ii) steric interactions 
with other regions of the fusion protein preventing close 



WO 94/18332 



PCT/US94/01624 



62 

association of sFv with the TfnR. It is of interest that a 
tripartite protein, DTM1-E6 sFv-PE40 was significantly 
cytotoxic to K562 (IC 50 around 1 x 10" 10 M, similar to that of 
PE40-E6 sFv) , and the toxic effect was clearly mediated via 
5 the TfnR, since this activity was blocked by addition of 

excess E6 Mab. Although it is possible that the inclusion of 
the PE40 moiety at the carboxyl end of the tripartite molecule 
results in a significant conformational change in domains more 
proximal to the amino terminus, it seems unlikely that the sFv 

10 binding domain of DTM1-E6 is misfolded, or unavailable to 

interact with the TfnR. Interactions of DTM1-E6 sFv with the 
cell surface could be measured in a direct binding assay 
(Greenfield et al. Science 238 : 536-539 , 1987), but these 
studies were not performed in the course of this 

15 investigation. Nevertheless, it appears likely that the lack 
of toxicity of the DTM1-E6 sFv fusion protein is due to a 
deficit in its translocation function. 

The expression system developed is rapid and easy, and 
facilitates the manipulation of a number of samples at once. 

20 No complicated protein purification or refolding procedures 
are required, and the method can be used to express proteins 
which, due to restrictions imposed on the manipulation of 
toxin -encoding genes, could not be produced by more 
conventional methods. The technique is ideal for ascertaining 

25 the suitability of new sFv for IT development; it is 

theoretically possible to assemble the sFv- encoding gene (and 
that encoding the IT itself) by splicing of PCR products 
derived directly from the hybridoma, without the necessity for 
cloning. This would facilitate the selection oE the most 

3 0 promising candidate molecule, prior to investing considerable 
effort and expense in large scale protein production and 
purification. Toxins and toxin -containing fusion- proteins are 
proving to be powerful aids in our understanding of receptor 
mediated endocytosis and intracellular routing, and are 

3 5 providing valuable insight into normal cell function (reviewed 
in ref. 2). The method described simplifies the generation of 
such molecules, and facilitates their production and use in 
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laboratories in which the application of more conventional 
expression methods would be impractical. 

Example 6 : Cassette Mutagenesis to Produce PAHIV Mutants . 

Three pieces of DNA are joined together. Piece A has 
5 vector sequences and encodes the "front half" (5' end of the 
gene) of PA protein, B is short piece of DNA (referred to as a 
cassette) and encodes a small middle piece of PA protein and 
piece C which encodes the "back half" (3* end of the gene) of 
PA. 

10 PA with alternate HIV-1 cleavage sites were created by 

a cassette mutagenesis procedure. Eight deoxyoligonucleotides 
were synthesized for construction of cassettes coding for 
specifically designed amino acid sequences. All four 
cassettes were generated by annealing two synthetic 

15 oligonucleotides (primers) . 

Primer 1A cg caa gta tca caa aat tat ccg atc gtg caa aac ata ctg cag g 

Q v S O N Y P 1 VP N I L Q 
Primer IB G TTC CTG CAG TAT GTT TTG CAC GAT CGG ATA ATT TTG TGA TAC TTG 



20 



25 



30 



35 



Primer 2A CG AAC ACT GCC ACT ATC ATG ATG CAA CGT GGT AAT TTT CTG CAG G 
N T AT I M M O R G N F L Q 

Primer 2B g tcc ctg cag aaa att acc acg ttg cat cat gat act ggc agt gtt 
Primer 3A cg act gtc tct ttt aac ttc ccg caa atc acg ctt tgg ctg c ag g 

T V S F N F P O 1 T L W L Q 

Primer 3B g tcc ctg cag cca aag cgt gat ttg cgg gaa gtt aaa aga gac agt 

Primer 4A CG GGC GGT TCT GCC TTT AAC TTC CCG ATC GTC ATG GGA GGT CTG CAG G 
G G S A F N F P 1 V M G G L Q 

Primer 4B G tcc ctg cag \cc tcc cat gac gat cgg gaa gti aaa ggc aga acc err 



The underlined portion of each protein sequence is 
recognized and cleaved by the HIV-l protease - 

Primer pair 1 encodes a protein sequence which 
duplicates part of the cleavage site found between the 
40 membrane associated protein and the capsid protein. 
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Primer pair 2 encodes a protein sequence which 
duplicates part of the cleavage site between the capsid and 
the nucleocapsid protein. 

Primer pair 3 encodes a protein sequence which 
5 duplicates part of the cleavage site between the protease and 
the p6 protein. Like the protease, p6 is a portion of the 
large protein produced by HIV. 

Primer pair 4 encodes a protein sequence which should 
be cleaved by the protease. It was created by examining 
10 several protein sequences which are recognized by the HIV 
protease and using the common residues from each sequence. 
Glycine residues were added to each end to make the molecule 
more flexible. 

The mutagenic cassettes were ligated with the 
15 BajnHI/BstBI fragment from plasmid pYS5 and the PpuMI-Baml-II 
fragment from plasmid pYS6. Plasmids shown to have correct 
restriction maps were transformed into the E. coli dam' dcm~ 
strain GM2163 (available from New England Bio-Labs , Beverly, 
MA.) . Unmethylated plasmid DNA was purified from each mutant 
20 and used to transform B. anthracis. For methods, see Klimpel, 
et al. Proc. Natl. Acad. Sci. 89:10277-10281 (1992). pYS5 
and pYS6 construction are described in Singh, et al . J . Bio. 
Chem. 264:19103-19107 (1989). 

The nucleotide and amino acid sequence of the mature 
25 PA protein after alteration with primer set 2 are shown below. 
Nucleotides residues 482 to 523 were replaced with cassette 2 
resulting in replacement of amino acid residues 162-171 of PA 
with residues NTATIMMQRGNFLQ , PAHIV#2 . The altered DNA 
sequence and the new amino acid residues are underlined. 
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Sequence Range: 1 to 2220 



60 



GAA GTT AAA CAG GAG AAC CGG TTATTAAAT GAA TCAGAA TCAAGT TCC CAG GGG TTACTA 
CTT CAATTT GTC CTC TTG GCC AATAATTTA CTT AGT CTT AGTTCA AGGGTC CCC AATGAT 
GluValLysGlnGluA8nArgLeuLeuAsnGluSerGluSerSerSerGlnGlyLeuLeu> 



120 
* 



GGA T AC TAT TTT AGT GAT TTG AATTTTCAAGCA CCCATG GTGGTTACCTCTTCT ACTACA 
CCT ATGATA AAA TCA CTAAAC TTAAAAGTT CGT GGGTAC CAC CAATGGAGAAGA TGATGT 
GlyTyrTyr Phe Ser Asp LeuAsnPheGlnAlaProMetValValThrSerSerThrThr> 

15 ISO 

+ 

GGG GATTTATCT ATT CCT AGT TCTGAGTTAGAAAATATT CCATCG GAAAAC CAA TATTTT 
CCC CTAAAT AGATAA GGATCA AGACTCAAT CTT TTATAA GGTAGC CTTTTG GTT ATAAAA 
GlyAepLeu Ser He Pro Ser SerGluLeuGluAsnlle ProSerGluAsnGlnTyrPhe> 



240 



CAA TCTGCT ATT TGG TCA GGA TTT AT C AAA GTT AAGAAG AGT GAT GAATAT ACATTTGCT 
GTT AGACGA TAA ACC AGT CCT AAATAGTTT CAA TTC TTC TCACTA CTT ATA TGT AAACGA 
25 GlnSerAlalleTrp SerGly PhelleLys Val LysLys SerAspGluTyrThr PheAla> 



300 



ACT TCCGCT GATAAT CAT GTA ACAATGTGG GTA GATGAC CAAGAAGTG ATT AAT AAAGCT 
3 0 TGA AGG CGA CTA TTA GTA CAT TGTTACACC CAT CTACTG GTT CTT CAC TAA TTA TTTCGA 

Thr SerAla Asp Asn His Val ThrMetTrp Val Asp Asp GlnGluVal He AsnLysAla> 



360 



3 5 TCT AATTCT AAC AAA ATC AGA TTAGAAAAA GGA AGATTA TAT CAA ATAAAA ATT CAATAT 

AGA TTAAGA TTG TTT TAG TCT AATCTTTTT CCT TCT AAT ATAGTT TATTTT TAA GTT ATA 
Ser AsnSer AsnLys He ArgLeuGluLysGlyArgLeuTyrGlnIleLysIleGlnTyr> 



420 



CAA CGAGAA AAT CCT ACT GAA AAAGGATTG GAT TTC AAG TTGTAC TGG ACC GAT TCT CAA 
GTT GCTCTT TTA GGA TGA CTT TTT CCT AAC CTA AAG TTC AAC ATG ACC TGG CTA AGAGTT 
Gin ArgGlu Asn Pro Thr Glu LysGlyLeuAsp PheLys LeuTyr TrpThr Asp SerGln> 

45 480 

AAT AAAAAA GAA GTG ATT TCT AGTGATAAC TTA CAATTG CCAGAATTAAAA CAA AAATCT 
TTA TTTTTT CTT CAC TAA AGA TCACTATTG AAT GTT AAC GGT CTT AAT TTT GTT TTT AGA 
Asn LysLys Glu Val He Ser SerAspAsnLeuGlnLeuProGluLeuLysGln LysSer> 



540 



TCGAAC ACTGCC ACT ATC ATG ATG CAA CGT GGT AAT TTTCTG CAG G GA CCTACG GTTCCA 
AGC TTG TGA CGG TGA TAG TAC TAC GTT GCA CCA TTA AAAGAC GTC CCT GGATGC CAAGGT 
55 Ser AsnThrAlaThr IleMetMenGlnArgGlvAsnPheLeuGln Glv ProThr ValPro> 



600 



GAC CGTGAC AAT GAT GGA ATC CCTGATTCATTA GAGGTA GAA GGA TAT ACG GTT GATGTC 
6 0 CTG GCACTG TTA CTA CCT TAG GGACTAAGT AAT CTC CAT CTT CCT ATATGC CAA CTACAG 

Asp ArgAsp Asn Asp Gly He ProAspSer LeuGluVal GluGlyTyrThr Val Asp Val > 



660 



65 AAA AAT AAA AGA ACT TTT CTT TCAC CATGG ATT TCT AAT ATT CAT GAAAAG AAA GGATTA 

TTT TTATTT TCT TGA AAA GAA AGTGGTACC TAA AGATTA TAAGTA CTT TTC TTT CCT AAT 
Lys AsnLys ArgThr Phe Leu SerProTrp He SerAsn IleHisGluLys Lys GlyLeu> 
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720 

ACC AAATAT AAA TCA TCT CCT GAAAAATGG AGC ACGGCT TCTGAT CCGTAC AGT GATTTC 
TGG TTTATA TTT AGT AGA GGA CTTTTTACC TCG TGC CGA AGACTAGGCATG TCA CTAAAG 
Thr LysTyr LyB Ser Ser Pro GluLysTrp Ser Thr Ala Ser Asp ProTyr Ser AspPhe> 

780 

GAAAAGGTT ACA GGA CGG ATT GATAAGAAT GTA TCA CCA GAG GCAAGACAC CCC CTTGTG 
CTT TTCCAA TGT CCT GCC TAA CTATTCTTA CAT AGTGGT CTC CGTTCTGTG GGG GAACAC 
GluLysVal ThrGlyArg He AspLysAsnVal SerProGluAlaArgHisPro LeuVal> 

840 
★ 

GCA GCTTAT CCG ATT GTA CAT GTAGATATG GAG AATATT ATT CTC TCAAAA AAT GAG GAT 
CGT CGAATA GGC TAA CAT GTA CATCTATAC CTC TTATAA TAA GAG AGT TTT TTA CTCCTA 
AlaAlaTyr Pro He Val Hie ValAspMetGluAsnlle IleIieuSerLysAsnGluAsp> 

900 

CAATCCACA CAG AAT ACT GAT AGTG AAACG AGA ACAATA AGT AAAAAT ACT TCT ACAAGT 
GTT AGGTGT GTC TTA TGA CTA TCACTTTGC TCT TGTTAT TCA TTT TTA TG A AGA TGTTCA 
Gin SerThr Gin Asn Thr Asp SerGluThr Arg Thr He SerLysABnThr Ser ThrSer> 

960 

AGG ACACAT ACT AGT GAA GTA CATGGAAAT GCA GAAGTG CAT GCG TCG TTC TTT GAT AIT 
TCC TGTGTA TGA TCA GTT CAT GTACCTTTA CGT CTTCAC GTACGC AGCAAG AAA CTATAA 
ArgThrHisThrSerGluVal HisGlyAsnAlaGluVal HisAlaSerPhe Phe Asplle> 

1020 

GGT GGG AGT GTA TCT GCA GGA TTTAGTAAT TCG AATTCA AGT ACG GTC GCA ATT GATCAT 
CCA CCCTCA CAT AGA CGT CCT AAATCATTA AGC TTAAGT TCATGC CAG CGT TAA CTAGTA 

GlyGlySerVal Ser Ala Gly PheSerAsnSer AsnSer SerThr ValAla He AspHis> 

10S0 

TCA CTATCT CTA GCA GGG GAA AGAACTTGG GCT GAAACA ATG GGTTTAAAT ACC GCTGAT 
AGT GAT AGA GAT CGT CCC CTT TCTTGAACC CGA CTT TGT TAC CCA AATTTA TGG CGACTA 
Ser LeuSer LeuAlaGlyGluArgThrTrpAlaGluThr MetGlyLeuAsnThr AlaAep> 

1140 

ACA GCAAGA TTA AAT GCC AAT ATTAGATAT GTA AAT ACT GGGACGGCTCCAATC TACAAC 
TGT CGTTCT AAT TTA CGG TTA TAATCTATA CAT TTATGA CCC TGC CGAGGT TAG ATGTTG 
Thr AlaArg LeuAenAlaAsn IleArgTyrVal AsnThr GlyThrAlaPro He TyrAen> 

1200 

*■ 

GTG TTACCA ACG ACT TCG TTA GTGTTAGGA AAA AATCAA ACA CTC GCG ACA ATT AAAGCT 
CAC AATGGT TGC TGA AGC AAT CACAATCCTTTT TTAGTT TGT GAG CGCTGT TAA TTT CGA 
Val LeuPro Thr Thr Ser Leu ValLeuGly Lys AsnGln Thr LeuAlaThr He LysA'^ 

1260 

AAG GAAAAC GAA TTA AGT CAA ATACTTGCA CCT AAT AAT TATTAT CCTTCT AAA AACTTG: 
TTC CTTTTG GTT AAT TCA GTT TATGAACGT GGA TTATTA ATAATA GG AAG A TTT TTGAAC 
Lys GluAsnGlnLeuSerGlnlleLeuAlaProAsnAsnTyrTyrProSer Lys AsnLeu> 

1320 

GCG CCAATC GCA TTA AAT GCA CAAGACGAT TTC AGTTCT ACT CCA ATT ACA ATG AATTAC 
CGC GGTTAG CGT AAT TTA CGT GTTCTG CTA AAG TCAAGA TGA GGT TAATGT TAC TTAATG 
AlaProIIeAlaLeuAsnAlaGlnAspAspPhe Ser Ser Thr Pro He Thr Met AsnTyr > 
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1440 

GGG AATATA GCAACA TAC AAT TTTGAAAAT GGA AGAGTG AGGGTG GATACA GGC TCGAAC 
CCC TTATAT GCT TGT ATG TTA AAACTTTTA CCT TCT CAC TCC CAC CTATGT CCG AGCTTG 
5 GlyAenlleAlaThrTyrAsnPheGluAsnGlyArgValArgValAepThrGlySerAsn 



1500 
* 

TGG AGTGAA GTG TTA CCG CAAATTCAAGAA ACA ACTGCA CGTATC ATT TTT AAT GGAAAA 
1 0 ACC TCACTT CAC AAT GGC GTT TAAGTTCTT TGT TGACGT GCATAG TAAAAA TTA CCTTTT 

TrpSerGluVal Leu Pro GlnlleGlnGluThrThrAlaArgllellePheAsnGlyLys 

1560 
* 

15 GAT TTAAAT CTG GTA GAAAGG CGGATAGCG GCG GTT AAT CCT AGT GATCCATTA GAAACG 

CTA AATTTA GAC CAT CTT TCC GCCTATCGC CGC CAATTA GGATCA CTAGGT AAT CTTTGC 
Asp LeuAsnLeuVal GluArgArglleAlaAla ValAsn ProSerAspProLeuGluThr 



1620 

20 

ACT AAACCG GAT ATG ACA TTA AAAGAAGCC CTT AAAATA GCATTT GGATTT AAC GAACCG 
TGATTTGGC CTA TAC TGT AAT TTTCTTCGG GAA TTTTAT CGTAAA CCTAAATTG CTTGGC 
Thr Lys Pro Asp Me t Thr Leu Ly BGluAla Leu Lys lie AlaPheGlyPhe Asn GluPro 



25 1680 



30 



AAT GGAAAC TTA CAA TAT CAA GGGAAAGAC ATA ACC GAA TTT GAT TTT AAT TTC GATCAA 
TTA CCTTTG AAT GTT ATA GTT CCCTTTCTG TAT TGG CTT AAA CTA AAA TTA AAG CTAGTT 
AsnGlyAsn Leu Gin Tyr Gin Gly Lys Asp lie ThrGlu PheAsp PheAsn Phe AspGln 



1740 



CAA ACATCT CAA AAT ATC AAG AAT CAG TTA GCG GAA TTA AAC GCA ACT AAC ATA TATACT 
GTT TGTAGA GTT TTA TAG TTC TTAGTCAAT CGC CTT AAT TTG CGT TGATTG TAT ATATGA 
35 GlnThrSerGlnAsn lie LysAsnGlnLeuAlaGluLeuAsnAlaThrAsnlle TyrThr 



1800 
* 

GTA TTA GAT AAA ATC AAA TTA AATGCAAAA ATG AAT ATT TTAATAAGAGAT AAA CGTTTT 
4 0 CAT AAT CTA TTT TAG TTT AAT TTACGTTTT TAC TTATAA AAT TAT TCT CTA TTT GCAAAA 

Val LeuAspLys lie Lys LeuAsnAlaLysMet Asnlle Leu lie Arg Asp Lys ArgPhe 



1860 

4 5 CAT TATGAT AG A AAT AAC ATA GCAGTTGGG GCG GATGAG TCAGTA GTT AAG GAG GCT CAT 

GTA ATACTA TCT TTA TTG TAT CGTCAACCC CGC CTACTC AGT CAT CAA TTC CTC CGAGTA 
His TyrAspArgAsnAsn IleAlaValGlyAlaAspGlu SerVal Val LysGlu AlaHis 



50 



60 



1920 



AGA GAAGTA ATT AAT TCG TCA ACA GAG GGA TTA TTG TTA AAT ATT GAT AAG GAT ATAAGA 
TCT CTT CAT TAA TTA AG C AGT TGT CTC CCT AAT AACAAT TTATAA CTATTC CTA TATTCT 
Arg GluVal IleAsn Ser SerThrGluGly Leu LeuLeu Asnlle Asp Lye Asp IleArg 

55 1980 

AAA ATA TTA TCA GOT TAT ATT 3TAGAF-ATT GAA GAT ACT GAAGGG CTT AAA GAA GTT ATA 
TTT TAT AAT AGT CCA ATA TAA CAT CTT TAA CTT CTATGA CTT CCC GAA ITT CTT CAATAT 
Lys lie Leu Ser Gly Tyr He ValGluIleGlu AspThr GluGlyLeuLys GluVal lie 



2040 



AAT GAC AGA TAT GAT ATG TTG AAT ATT TCT AGT TTACGG CAAGAT GGAAAA ACA TTT ATA 
TTA CTGTCT ATA CTA TAC AAC TTATAAAGA TCA AATGC C GTT CTA CCT TTT TGT AAATAT 
65 Asn AspArgTyrAspMet LeuAsnlleSer Ser LeuArg GlnAspGlyLysThr Phe Tie 
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2100 



GAT TTTAAA AAA TAT AAT GAT AAATTACCG TTA TAT ATA AGTAAT CCCAAT TAT AAGGTA 
CTA AAATTT ITT ATA TTA CTATTTAATGGC AAT ATATAT TCATTA GGGTTA ATA TTC CAT 
5 Asp PheLys Lys Tyr AsnAsp LysLeuProLeuTyrlle SerAsnProAsnTyr LysVal 



2160 
* 



AAT GTATAT GCT GTT ACT AAA GAAAACACT ATT ATT AAT CCT AGT GAG AAT GGG GATACT 
10 TTA CAT ATA CGA CAA TGA TIT CTTTTGTGATAA TAATTA GGATCA CTC TTA CCC CTATGA 
ABnValTyr AlaVal ThrLys GluAsnThrlle IleAsnProSerGluABnGlyAspThr 

2220 
* 

15 AGT ACCAAC GGGATC AAG AAA ATTTTAATC TTT TCT AAA AAAGGC TATGAG ATA GGATAA 

TCA TGGTTG CCC TAG TTC ITT TAAAATTAG AAA AGATTT TTT CCG ATA CTC TAT CCT ATT 
Ser ThrAsnGly lie Lys Lys IleLeuIle Phe SerLys LysGlyTyrGluIle Gly*** 

20 The above procedure was followed for PAHIV#1, 3 and 4. 

Example 7: Cleavage of Mutant PAHIV Proteins in vitro . 

The mutated proteins were treated with purified HIV- 1 
protease and evaluated for their degree of cleavage with 

25 respect to time. The purified protease was obtained from the 
NIH AIDS Research and Reference Reagent Program, Division of 
AIDS, NIAID , Bethesda, MD. Alternatively, the protease can be 
purified following the method of Louis, et al . , Euro. J. 
Biochem. , 199:361 (1991). 

30 Extended incubation (12 hours) of PA or the mutated PA 

proteins with the purified HIV-1 protease resulted in the 
appearance of two additional protein fragments that were not 
anticipated. These two fragments are approximately 53 
kilodaltons and 30 kilodaltons in size* This may represent 

35 cleavage of PA and mutant PA proteins at a site recognized by 
the HIV-l protease between PA residues Y 259 and P 260 . The 
residues around this cleavage site, 256 VAAYPIVHV 264 , have not 
previously been identified as a potential HIV-l protease 
cleavage site. 

40 Incubation of RAW 264.7 cells (ATCC No. TIB 71) with 

lethal factor (LFi and HIV-l protease- cleaved PAHIVftl or 
PAHIV* 4 caused cell death, demonstrating that the mutated PA 
proteins are capable of binding to LF and thus the toxic LP/PE 
fusion proteins. PAHIV , PAHIV«2 and PAHIVft3 have not yet been 

45 tested. 
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Example 8 : Kw>1 nation nf cytot oxi c agents in cell cultures . 

The ability of the PA constructs containing the HIV-1- 
protease cleavage site to promote killing of HIV-1 infected 
cells is being evaluated in COS-l cells (ATCC No. CRL 1650) 
transfected with the vector HIV-gpt. When COS cells are 
transfected with this plasmid vector they express all the 
genes for the production of HIV-1 virus particles except the 
envelope protein, gpl60 (Page, K.A. , et al. f 1990. J. Virol. 
64:5270-5276). Without the envelope protein the particles are 
not infectious. These cells express the HIV-1 proteases and 
properly cleave the viral protein gp55 to gp24 (Page, K.A. , et 
al., 1990. J. Virol. 64:5270-5276). These properties make the 
transfected cells an excellent model system in which to 
evaluate the ability of protein constructs of the invention to 
eliminate HIV-1 infected cells from culture. 

The COS-l cells were transfected with the plasmid 
vector and the resulting cultures are being selected for 
stable transf ectents . The mutated PA proteins (PAHIV#1, 
PAHIV#2, PAHIV#3 and PAHIV#4) are added to the culture media 
of growing HIV-gpt transfected COS-l cells in the presence of 
the lethal factor fusion protein FP53 (Arora, N. et al . J. 
Biol. Chem. 267:15542 (1992)). Only cells which properly 
cleave the mutated PA proteins are able to bind the toxin LF 
fusion protein. The cultures are evaluated for protein 
expression (an indirect measure of viability) after 36 hours 
(Arora, N. and S. H. Leppla. 1992. J. Biol. Chem. 268:3334). 



Example 9 : Treatment-, nf an H TV-l infected patient. 

A human patient who is infected with HIV-1 is selected 
for treatment. Although infected, this particular patient is 
asymptomatic. The patient weighs 70 kilograms. A dose of 10 
micrograms per kilogram or 700 micrograms of a PAI-IIV in normal 
saline is prepared. This dosage is injected into the patient 
intravenously as a bolus. The dose is repeated weekly for a 
total of 4 to 6 dosages. The patient is evaluated regularly, 
such as weekly, in terms of his symptoms, physical exam and 
laboratory analysis according to the clinician's judgment. 
Tests of particular interest include the patient's complete 
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blood count and examination for the presence of HIV infection. 
The treatment regimen can be repeated with or without 
alterations at the discretion of the clinician. 
Incorporated by reference/paragraph before claims 
5 Unless defined otherwise, all technical and scientific 

terms used herein have the same meaning as commonly understood 
by one of ordinary skill in the art to which this invention 
belongs . Although any methods and materials similar or 
equivalent to those described can be used in the practice or 

10 testing of the present invention, the preferred methods and 
materials are now described. All publications and patent 
documents referenced in this application are incorporated 
herein by reference. 

It is understood that the examples and embodiments 

15 described herein are for illustrative purposes only and that 
various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included 
within the spirit and purview of this application and scope of 
the appended claims. 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO:I: 
AAATTAGGAT TTCGGTTATG TTTAGTATTT TTTTAAAATA ATAGTATTAA ATAGTGGAAT 60 
GCAAATGATA AATGGGCTTT AAACAAAACT AATGAAATAA TCTACAAATG GAATTTCTCC 120 
AGTTTTAGAT TAAACCATAC CAAAAAAATC ACACTGTCAA GAAAAATGAT AGAATCCCTA 160 
CACTAATTAA CATAACCAAA TTGGTAGTTA TAGGTAGAAA CTTATTTATT TCTATAATAC 240 
CATGCAAAAA AGTAAATATT CTGTTCCATA CTATTTTAGT AAATTATTTA GCAAGTAAAT 300 
TTTGGTGTAT AAACAAAGTT TATCTTAATA TAAAAAATTA CTTTACTTTT ATACAGATTA 360 
15 AAATGAAAAA TTTTTTATGA CAAGAAATAT TGCCTTTAAT TTATGAGGAA ATAAGTAAAA 420 

TTTTCTACAT ACTTTATTTT ATTGTTGAAA TGTTCACTTA TAAAAAAGGA GAGATTAAAT 480 
ATGAATATAA AAAAAGAATT TATAAAAGTA ATTAGTATGT CATGTTTAGT AACAGCAATT 540 

20 

ACTTTGAGTG GTCCCGTCTT TATCCCCCTT GTACAGGGG GCG GGC GGT CAT GGT 594 

Ala Gly Gly His Gly 
1 5 

25 GAT GTA GGT ATG CAC GTA AAA GAG AAA GAG AAA AAT AAA GAT GAG AAT 642 

Asp Val Gly Met His Val Lys Glu Lys Glu Lys Asn Lye Asp Glu Asn 
10 15 20 

AAG AGA AAA GAT GAA GAA CGA AAT AAA ACA CAG GAA GAG CAT TTA AAG 690 
30 Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin Glu Glu His Leu Lys 

25 30 35 

GAA ATC ATG AAA CAC ATT GTA AAA ATA GAA GTA AAA GGG GAG GAA GCT 738 
Glu He Met Lys His He Val Lys He Glu Val Lys Gly Glu Glu Ala 
35 40 45 50 

GTT AAA AAA GAG GCA GCA GAA AAG CTA CTT GAG AAA GTA CCA TCT GAT 786 
Val Lys Lys Glu Ala Ala Glu Lye Leu Leu Glu Lys Val Pro Ser Asp 
55 60 65 

40 

GTT TTA GAG ATG TAT AAA GCA ATT GGA GGA AAG ATA TAT ATT GTG GAT 83 4 

Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys He Tyr He Val Asp 
70 75 80 85 

45 GGT GAT ATT ACA AAA CAT ATA TCT TTA GAA GCA TTA TCT GAA GAT AAG 882 

Gly Asp He Thr Lys His He Ser Leu Glu Ala Leu Ser Glu Asp Lys 
90 95 100 

AAA AAA ATA AAA GAC ATT TAT GGG AAA GAT GCT TTA TTA CAT GAA CAT 93 0 

50 Lyp Lys He Lys Asp He Tyr G'ly Lys Asp Ala Leu Le\J His Glu His 

105 110 115 

TAT GTA TAT GCA AAA GAA GGA TAT GAA CCC GTA CTT GTA ATC CAA TCT 9'M 
Tyr Val Tyr Ala Lys Glu Gly Tvx Glu Pro Val Leu Val He Gin Ser 
55 120 125 13 3 

TCG GAA GAT TAT GTA GAA AAT ACT GAA AAG GCA CTG A."Y2 GTT TAT T V" ) 0 .1 S 

Ser Glu Asp Tyr Val Glu Asn Thr Giu Lys Ala Le>u aj.i tfal Tyr Tv„ 
135 140 145 



60 



GAA ATA GGT AAG ATA TTA TCA AGG GAT ATT TTA AGT AAA ATT AAT CAA 107 4 

Glu lie Gly Lys He Leu Ser Arg Asp He Leu Ser Lys He Asn Gin 
150 155 160 165 



6 5 CCA TAT CAG AAA TTT TTA GAT GTA TTA AAT ACC ATT AAA AAT GCA TC~ 112 I 

Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr He Lys Asn Ala Ser 
170 175 180 

GAT TCA GAT GGA CAA GAT CTT TTA TTT ACT AAT CAG CTT AAG GAA CAT 1170 
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Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn Gin Leu Lys Glu His 
185 190 195 

CCC ACA GAC ITT TCT GTA GAA TTC TTG GAA CAA AAT AGC AAT GAG GTA 1218 
5 Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin Asn Ser Asn Glu Val 
200 205 210 

CAA GAA GTA TTT GCG AAA GCT TTT GCA TAT TAT ATC GAG CCA CAG CAT 1266 
Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr lie Glu Pro Gin His 
10 215 220 225 

CGT GAT GTT TTA CAG CTT TAT GCA CCG GAA GCT TTT AAT TAC ATG GAT 1314 
Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala Phe Asn Tyr Met Asp 
230 235 240 245 



15 



35 



55 



AAA TTT AAC GAA CAA GAA ATA AAT CTA TCC TTG GAA GAA CTT AAA GAT 1362 
Lys Phe Asn Glu Gin Glu lie Asn Leu Ser Leu Glu Glu Leu Lys Asp 
250 255 260 



20 CAA CGG ATG CTG TCA AGA TAT GAA AAA TGG GAA AAG ATA AAA CAG CAC 1410 

Gin Arg Met Leu Ser Arg Tyr Glu Lys Trp Glu Lys lie Lys Gin His 
265 270 275 

TAT CAA CAC TGG AGC GAT TCT TTA TCT GAA GAA GGA AGA GGA CTT TTA 1458 
25 Tyr Gin His Trp Ser Asp Ser Leu Ser Glu Glu Gly Arg Gly Leu Leu 
280 285 290 

AAA AAG CTG CAG ATT CCT ATT GAG GCA AAG AAA GAT GAC ATA ATT CAT 1506 
Lys Lys Leu Gin He Pro He Glu Pro Lys Lys Asp Asp He He His 
30 295 300 305 

TCT TTA TCT CAA GAA GAA AAA GAG CTT CTA AAA AGA ATA CAA ATT GAT 1554 
Ser Leu Ser Gin Glu Glu Lys Glu Leu Leu Lys Arg He Gin He Asp 
310 315 320 325 



AGT AGT GAT TTT TTA TCT ACT GAG GAA AAA GAG TTT TTA AAA AAG CTA 1602 
Ser Ser Asp Phe Leu Ser Thr Glu Glu Lys Glu Phe Leu Lys Lys Leu 
330 335 340 



40 CAA ATT GAT ATT CGT GAT TCT TTA TCT GAA GAA GAA AAA GAG CTT TTA 1650 

Gin He Asp He Arg Asp Ser Leu Ser Glu Glu Glu Lys Glu Leu Leu 
345 350 355 

AAT AGA ATA CAG GTG GAT AGT AGT AAT CCT TTA TCT GAA AAA GAA AAA 1698 
45 Asn Arg He Gin Val Asp Ser Ser Asn Pro Leu Ser Glu Lys Glu Lys 
360 365 370 

GAG TTT TTA AAA AAG CTG AAA CTT GAT ATT CAA CCA TAT GAT ATT AAT 174 6 

Glu Phe Leu Lye Lys Leu Lys Leu Asp He Gin Pro Tyr Asp He Asn 
50 375 380 385 

CAA AGG TTG CAA GAT ACA GGA GGG TTA ATT GAT AGT CCG TC* ATT AAT 3 794 

Gin A:-g Leu Gin Asp Thr Gly Gly Leu He Asp Ser Pro Ser He Asn 
390 395 400 405 



CTT GAT GTA AGA AAG CAG TAT AAA AGG GAT ATT CAA AAT ATT GAT GCT 1 

Leu A:.;p ■ Rx' r < Lys Gin Tyr L/s Arg A:;p He Gin Asn He Asp A J j 
410 415 420 



60 TTA TTA CAT CAA TCC ATT GGA AGT ACC TTG TAC AAT AAA ATT TAT TTG 1890 

Leu Leu His Gin Ser lie Gly Ser Thr Leu Tyr Asn Lys lie Tyr Leu 
425 430 435 

TAT GAA AAT ATG AAT ATC AAT AAC CTT ACA GCA ACC CTA GGT GCG GAT 193fi 
65 Tyr G xu Asn Met Asn He Asn Asn Leu Thr Ala Thr Leu Gly Ala Asp 
440 445 450 



TTA GTT GAT TCC ACT GAT AAT ACT AAA ATT AAT AGA GGT ATT TTC AAT 
Leu Val Asp Ser Thr Asp Asn Thr Lys He Asn Arg Gly He Phe Asn 



1986 
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455 460 465 

GAA TTC AAA AAA AAT TTC AAA TAT AGT ATT TCT AGT AAC TAT ATG ATT 2034 
Glu Phe Lys Lys Asn Phe Lys Tyr Ser lie Ser Ser Asn Tyr Met He 
470 475 480 485 

GTT GAT ATA AAT GAA AGG CCT GCA TTA GAT AAT GAG CGT TTG AAA TGG 2082 
Val Asp He Asn Glu Arg Pro Ala Leu Asp Asn Glu Arg Leu Lys Trp 
490 495 500 

AGA ATC CAA TTA TCA CCA GAT ACT CGA GCA GGA TAT TTA GAA AAT GGA 2130 
Arg He Gin Leu Ser Pro Asp Thr Arg Ala Gly Tyr Leu Glu Asn Gly 
505 510 515 



15 AAG CTT ATA TTA CAA AGA AAC ATC GGT CTG GAA ATA AAG GAT GTA CAA 2178 

Lys Leu He Leu Gin Arg Asn He Gly Leu Glu lie Lys Asp Val Gin 
520 525 530 

ATA ATT AAG CAA TCC GAA AAA GAA TAT ATA AGG ATT GAT GCG AAA GTA 2226 
20 lie He Lys Gin Ser Glu Lys Glu Tyr He Arg He Asp Ala Lye Val 
535 540 545 

GTG CCA AAG AGT AAA ATA GAT ACA AAA ATT CAA GAA GCA CAG TTA AAT 2274 
Val Pro Lys Ser Lys He Asp Thr Lys He Gin Glu Ala Gin Leu Asn 
25 550 555 560 565 

ATA AAT CAG GAA TGG AAT AAA GCA TTA GGG TTA CCA AAA TAT ACA AAG 2322 
He Asn Gin Glu Trp Asn Lys Ala Leu Gly Leu Pro Lys Tyr Thr Lys 
570 575 580 

30 

CTT ATT ACA TTC AAC GTG CAT AAT AGA TAT GCA TCC AAT ATT GTA GAA 2370 
Leu He Thr Phe Asn Val His Asn Arg Tyr Ala Ser Asn He Val Glu 
585 590 595 

3 5 AGT GCT TAT TTA ATA TTG AAT GAA TGG AAA AAT AAT ATT CAA AGT GAT 2418 

Ser Ala Tyr Leu He Leu Asn Glu Trp Lys Asn Asn He Gin Ser Asp 
600 605 610 

CTT ATA AAA AAG GTA ACA AAT TAC TTA GTT GAT GGT AAT GGA AGA TTT 2466 
40 Leu He Lys Lys Val Thr Asn Tyr Leu Val Asp Gly Asn Gly Arg Phe 
615 620 625 

GTT TTT ACC GAT ATT ACT CTC CCT AAT ATA GCT GAA CAA TAT ACA CAT 2514 
Val Phe Thr Asp He Thr Leu Pro Asn He Ala Glu Gin Tyr Thr His 
45 630 635 640 645 

CAA GAT GAG ATA TAT GAG CAA GTT CAT TCA AAA GGG TTA TAT GTT CCA 2562 

Gin Asp Glu lie Tyr Glu Gin Val His Ser Lys Gly Leu Tyr Val Pro 

650 655 660 

50 

GAA TCC CGT TCT ATA TTA CTC CAT GGA CCT TCA AAA GGT GTA GAA TTA 2610 

Glu Ser Arg Ser He Leu Leu His Gly Pro Ser Lys Gly Val Glu Leu 
665 670 675 

55 AGG AAT GAT AGT GAG GGT TTT ATA CAC GAA TTT GGA CAT GCT GTG GAT 2658 

Arg Asn Asp Ser Glu Gly Phe He His Glu Phe Gly His Ala Val Asp 

630 635 690 

GAT TAT GCT GGA TAT CTA TTA GAT AAG AAC CAA TCT GAT TTA GTT ACA 27 06 

GO Asp Tyr Ala Gly Tyr Leu Leu Asp Lys Asn Gin Ser Asp Leu Val Thr 
695 700 705 

AAT TCT AAA AAA TTC ATT GAT ATT TTT AAG GAA GAA GGG ACT AAT TTA 2754 
Asn Ser Lvs Lys Phe He Aso lie Phe Lys Glu Glu Gly Ser Asn Leu 

65 710 715 ' 720 725 

ACT TCG TAT GGG AGA ACA AAT GAA GCG GAA TTT TTT GCA GAA GCC TTT 2802 
Thr Ser Tyr Gly Arg Thr Asn Glu Ala Glu Phe Phe Ala Glu Ala Phe 
730 735 740 
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AGG TTA ATG CAT TCT ACG GAC CAT GCT GAA CGT TTA AAA GTT CAA AAA 2850 
Arg Leu Met His Ser Thr Asp His Ala Glu Arg Leu Lys Val Gin Lys 
745 750 755 

AAT GCT CCG AAA ACT TTC CAA TTT ATT AAC GAT CAG ATT AAG TTC ATT 2898 
Asn Ala Pro Lys Thr Phe Gin Phe lie Asn Asp Gin lie Lys Phe lie 
760 765 770 

ATT AAC TCA TAAGTAATGT ATTAAAAATT TTCAAATGGA TTTAATAATA 2947 
lie Asn Ser 
775 

ATAATAATAA TAATAATAAC GGGACCAGCC ATTATGAAGC AACTAATTCT AGACTTGATA 3007 

GTAATTCTTG GGAAGCACCA GATAGTGTAA AAGGTGGCAT TGCCAGAATG ATATTTTATG 3067 

TGTTCGTTAG ATATGAAGGC AAAAACAATG ATCCTGACCT AGAACTTAAT GATAATGTTA 3127 

TTAATAATTT AATGCCTTTT ATAGGAATAT TAGTAAAAGT GCCGAAAAGA TCCTGTTGCA 3187 

AAGCTTTTAA AGAACATATT ATTCTATCAA GTGGCTGTAT ATTTTGTGTA ATTTTCAATA 3247 

AATTTTGTAA TTAAGCATAC GTCAAAAAAC CGAAATCTGA GCTC 3291 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH; 776 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
1 5 10 15 

Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

Glu Glu His Leu Lys Glu He Met Lys His He Val Lys He Glu Val 
35 40 45 

Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 

50 55 CO 

Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala lie Gly Gly Lys 
65 70 75 80 

He Tyi He Val Asp Glv Asp He Thr Lys His He S?vr Leu Glu Ala 
85 " 90 95 

Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 HO 

Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
115 120 125 

Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
130 135 140 

Leu Asn Val Tyr Tvr Gin He Gly Lys He Leu Ser Arg Asp He Leu 
145 150 155 160 

Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr 
165 170 175 
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lie Lys Asn Ala Ser Asp Sex- Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 185 190 

Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
5 195 200 205 

Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr 
210 215 220 

10 He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 



15 



30 



45 



60 



Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Ser Leu 
245 250 255 

Glu Glu Leu Lys Asp Gin Arg Met Leu Ser Arg Tyr Glu Lys Trp Glu 
260 265 270 



Lys He Lys Gin His Tyr Gin His Trp Ser Asp Ser Leu Ser Glu Glu 
20 275 280 285 

Gly Arg Gly Leu Leu Lys Lys Leu Gin He Pro He Glu Pro Lys Lys 
290 295 300 

25 Asp Asp Xle He His Ser Leu Ser Gin Glu Glu Lys Glu Leu Leu Lys 
305 310 315 320 



Arg lie Gin He Asp Ser Ser Asp Phe Leu Ser Thr Glu Glu Lys Glu 
325 330 335 

Phe Leu Lys Lys Leu Gin He Asp He Arg Asp Ser Leu Ser Glu Glu 
340 345 350 



Glu Lys Glu Leu Leu Asn Arg He Gin Val Asp Ser Ser Asn Pro Leu 
35 355 360 365 

Ser Glu Lys Glu Lys Glu Phe Leu Lys Lys Leu Lys Leu Asp He Gin 
370 375 380 

40 Pro Tyr Asp He Asn Gin Arg Leu Gin Asp Thr Gly Gly Leu He Asp 
385 390 395 400 



Ser Pro Ser lie Asn Leu Asp Val Arg Lys Gin Tyr Lys Arg Asp He 
405 410 415 

Gin Asn He Asp Ala Leu Leu His Gin Ser He Gly Ser Thr Leu Tyr 

420 425 430 



Asn Lys He Tyr Leu Tyr Glu Asn Met Asn He Asn Asn Leu Thr Ala 

50 435 440 4i5 

Thr Leu Gly Ala Asp Leu Val Asp Ser Thr- Asp Asn Thr Lys He Asn 
450 455 46J 

55 Arg Gly He Phe Asn Glu Phe Lys Lys Asn Phe Lys Tyr Ser He Ser 
465 470 475 480 



Ser Asn Tyr Met He Val Asp He Asn Glu Arg Pro Aia Leu Asp Asn 
485 " 490 " 495 

Glu Arg Leu Lys Trp Arg He Gin Leu Ser Pro Asp Thr Arg Ala Gly 
500 505 510 



Tvr Leu Glu Asn Gly Lys Leu He Leu Gin Arg Asn lie Gly Leu Glu 

€5 515 520 525 

He Lys Asp Val Gin He He Lys Gin Ser Glu Lys Glu Tyr He Arg 
530 535 540 
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lie Asp Ala Lys Val Val Pro Lys Ser Lys He Asp Thr Lys He Gin 
545 550 555 560 

Glu Ala Gin Leu Asn He Asn Gin Glu Trp Asn Lys Ala Leu Gly Leu 
565 570 575 

Pro Lys Tyr Thr Lys Leu He Thr Phe Asn Val His Asn Arg Tyr Ala 
580 585 590 

Ser Asn He Val Glu Ser Ala Tyr Leu lie Leu Asn Glu Trp Lys Asn 
595 600 605 

Asn He Gin Ser Asp Leu He Lys Lys Val Thr Asn Tyr Leu Val Asp 
610 615 620 

Gly Asn Gly Arg Phe Val Phe Thr Asp He Thr Leu Pro Asn He Ala 
625 630 635 640 

Glu Gin Tyr Thr His Gin Asp Glu He Tyr Glu Gin Val His Ser Lys 
645 650 655 

Gly Leu Tyr Val Pro Glu Ser Arg Ser He Leu Leu His Gly Pro Ser 
660 665 670 

Lys Gly Val Glu Leu Arg Asn Asp Ser Glu Gly Phe He His Glu Phe 
675 680 685 

Gly His Ala Val Asp Asp Tyr Ala Gly Tyr Leu Leu Asp Lys Asn Gin 
690 695 700 

Ser Asp Leu Val Thr Asn Ser Lys Lys Phe He Asp lie Phe Lye Glu 
705 710 715 720 

Glu Gly Ser Asn Leu Thr Ser Tyr Gly Arg Thr Asn Glu Ala Glu Phe 
725 730 735 

Phe Ala Glu Ala Phe Arg Leu Met His Ser Thr Asp His Ala Glu Arg 
740 745 750 

Leu Lys Val Gin Lys Asn Ala Pro Lys Thr Phe Gin Phe He Asn Asp 
755 760 765 

Gin He Lys Phe He He Asn Ser 

770 775 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4235 base pairs 

(B) TYPE: nucleic acid 
<C} STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1891.. 4095 

(D) OTHER INFORMATION: /product- "Protective Antigen" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
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AAGCTTCTGT CATTCGTAAA TTTCAAATAG 
ATGAAAAATC TTATCTTTTT GATTCTATTG 
5 AAAAGACAGT TGATGCTATT ACTCCAGATA 
AACCTTGTTG TTCTAAATAA TGATTTTGTG 
CTAATTTTAT AGTGATTTAA CTAACAATTT 

10 

GATTTTTCCT GAAGCATAGT ATAAAAGAGT 
ATTAGGAATT AACAATATAT ATAATGCGCT 
15 TATTTTAGTA AGAGATC CAT ATCATTATGA 

TTCATATTTA AAAAACGCAT ATAAGCAAAT 
AAATCTAGAT GAAGATGTAA ATCAAGCACT 

20 

TTCAAACCAC CTAACAAACA GCCCAGTTAC 
TGGAGAATTG TATAGAGTAT TATCAGATGG 
25 TGAAAATTGG CGATCATTAG TAGATCCTGG 

AGAAGATTTT AATGCAGTTA CTCGAGATGA 
CACCTTAGTT TTATCGGGTA AAATAAAAGA 

30 

ATTTGTAGTT TTTATGTTTA TTATATACCT 

TGCAAATCAT GTAATTGTAT ACTTATCTAT 

35 TTTTATTGAA CGTTGGTTAG CTTGGACAGT 

AAATTTCACG CACCACAATA AAACTAATTT 

CAGTTCTTTT AATAAGGAGC TGCCCACCAA 

40 

AGGTTTTTTT CTAAATATAC AGTGTAAGTT 

TGTTTTATGT TAACAAATTA AATTGTAAAA 

45 TTTTAAATTT TTTGTTGAAA TTAGAAAAAA 

GTTGTTTTTG GGTTACAAAA CAAAAAGAAA 

TTTAGCTTTC TGTAAAACAG CCTTAATAGT 

50 

GCATACACAA TCTATTGAAG GATATTTATA 
CCAGTTCTTT TATC CGAACT GATACACGTA 
55 AACAG CTTCT GTGTCCTTTT CTATTAAACA 

AAAAGTTCTG TTTAAAAAGC CAAAAATAAA 
AAACTAAAGT TTATTAATTT CAATATAATA 

60 

TATATGAAAA AACGAAAAGT GTTAATACCA 
AGCACAGGTA ATTTAGAGGT GATTCAGGCA 

65 

TTA AAT GAA TCA GAA TCA . AST TCC < 
Leu Asn Glu Ser Glu Ser Ser Ser < 
10 15 



78 

AACGTAAATT TAGACTTCTC ATCATTAAAA 60 

TATATTTTTA TTAAGGTGTT TAATAGTTAG 120 

AAATATAGCT AACCATAAAT TTATTAAAGA 180 

GATTCCGGAA TAGATACTGG TGAGTTAGCT 240 

ATAAAGCAGC ATAATTCAAA TTTTTTAATT 300 

CAAGGTCTTC TAGACTTGAC TCTTGGAATC 360 

AGACAGAATC AAATTAAATG CAAAAATGAA 420 

TAATAACGGT AATATTGTAG GGGTTGATGA 480 

ACTTAATTGG TCAAGCGATG GAGTTTCTTT 54 0 

ATCTGGATAT ATGCTTCAAA TAAAAAAACC 600 

AATTACATTA GCAGGCAAGG ACAGTGGTGT 660 

AGCAGGATTC CTGGATTTCA ATAAGTTTGA 720 

TGATGATGTT TATGTGTATG CTGTTACTAA 780 

AAATGGTAAT ATAGCGAATA AATTAAAAAA 840 

AATAAACATA AAAACTACAA ATATTAATAT 900 

CCTATTTTAT ATTATTAGTA GCACAGTTTT 960 

GTAGAGGTAT CACAACTTAT GAATAGTGTA 1020 

TGTATGGATA TGCATACTTT ATAACGTATA 1080 

AACAAAAACA AAAACACACC TAAGATCATT 1140 

GCTAAACCTA AATAATCTTT GTTTCACATA 1200 

ATTGTGAATT TAACCAGTAT ATATTAAAAA 12 60 

CCCCTCTTAA GCATAGTTAA GAGGGGTAGG 13 20 

TAATAAAAAA ACAAACCTAT TTTCTTTCAG 1380 

ACATGTTTCA AGGTACAATA ATTATGGTTC 144 0 

TGGATTTATG ACTATTAAAG TTAGTATACA 1500 

ATGCAATTCC CTAAAAATAG TTTTGTATAA 1560 

TTTTAGCATA ATTTTTAATG TATCTTCAAA 1620 

TATAAATTCT TTTTTATGTT ATATATTTAT 1680 

TAATTATCTC TTTVTATTTA TATTATATTG 174 0 

TAAATTTAAT TTTATACAAA AAGGAGAACG 1800 

TTAATGGCAT TGTCTACGAT ATTAGTTTCA 1860 

GAA GTT AAA CAG GAG AAC CGG TTA 1914 
Glu Val Lys Gin Glu Asn Arg Leu 

1 B 

:AG GGG TTA CTA GGA TAG TAT ITT 196 2 
51n Gly Leu Leu Gly Tyr Tyr Phe 
20 
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AGT GAT TTG AAT TTT CAA GCA CCC ATG GTG GTT ACC TCT TCT ACT ACA 2010 
Ser Asp Leu Asn Phe Gin Ala Pro Met Val Val Thr Ser Ser Thr Thr 
25 30 35 40 

5 GGG GAT TTA TCT ATT CCT AGT TCT GAG TTA GAA AAT ATT CCA TCG GAA 2058 

Gly Asp Leu Ser lie Pro Ser Ser Glu Leu Glu Asn lie Pro Ser Glu 
45 50 55 

AAC CAA TAT TTT CAA TCT GCT ATT TGG TCA GGA TTT ATC AAA GTT AAG 2106 
10 Asn Gin Tyr Phe Gin Ser Ala lie Trp Ser Gly Phe lie Lys Val Lys 
60 65 70 

AAG AGT GAT GAA TAT ACA TTT GCT ACT TCC GCT GAT AAT CAT GTA ACA 2154 
Lys Ser Asp Glu Tyr Thr Phe Ala Thr Ser Ala Asp Asn His Val Thr 
15 75 80 85 

ATG TGG GTA GAT GAC CAA GAA GTG ATT AAT AAA GCT TCT AAT TCT AAC 2202 
Met Trp Val Asp Asp Gin Glu Val He Asn Lys Ala Ser Asn Ser Asn 
90 95 100 



20 



AAA ATC AGA TTA GAA AAA GGA AGA TTA TAT CAA ATA AAA ATT CAA TAT 2250 
Lye He Arg Leu Glu Lys Gly Arg Leu Tyr Gin He Lys He Gin Tyr 
105 110 115 120 



25 CAA CGA GAA AAT CCT ACT GAA AAA GGA TTG GAT TTC AAG TTG TAC TGG 2298 

Gin Arg Glu Asn Pro Thr Glu Lys Gly Leu Asp Phe Lys Leu Tyr Trp 
125 130 135 

ACC GAT TCT CAA AAT AAA AAA GAA GTG ATT TCT AGT GAT AAC TTA CAA 2346 
30 Thr Asp Ser Gin Asn Lys Lys Glu Val He Ser Ser Asp Asn Leu Gin 
140 145 150 

TTG CCA GAA TTA AAA CAA AAA TCT TCG AAC TCA AGA AAA AAG CGA AGT 2394 
Leu Pro Glu Leu Lys Gin Lys Ser Ser Asn Ser Arg Lys Lys Arg Ser 
35 155 160 165 

ACA AGT GCT GGA CCT ACG GTT CCA GAC CGT GAC AAT GAT GGA ATC CCT 2442 
Thr Ser Ala Gly Pro Thr Val Pro Asp Arg Asp Asn Asp Gly He Pro 
170 175 180 

40 

GAT TCA TTA GAG GTA GAA GGA TAT ACG GTT GAT GTC AAA AAT AAA AGA 2490 
Asp Ser Leu Glu Val Glu Gly Tyr Thr Val Asp Val Lys Asn Lys Arg 
185 190 195 200 

45 ACT TTT CTT TCA CCA TGG ATT TCT AAT ATT CAT GAA AAG AAA GGA TTA 2538 

Thr Phe Leu Ser Pro Trp He Ser Asn He Hie Glu Lys Lys Gly Leu 
205 210 215 

ACC AAA TAT AAA TCA TCT CCT GAA AAA TGG AGC ACG GCT TCT GAT CCG 2586 
50 Thr Lys Tyr Lys Ser Ser Pro Glu Lys Trp Ser Thr Ala Ser Asp f ro 
220 225 230 

TAC AGT GAT TTC GAA AAG GTT ACA GGA CGG ATT GAT AAG AAT GTA TCA 2 634 

Tyr Ser Asp Phe Glu Lys Val Thr Gly Arg He Asp Lys Asn Val Ser 
55 235 240 ' 245 

CCA GAG G"J\ AGA CAC CCC CTT GTG GCA GCT TAT C7C ATT GTA CAT CT£ ;:G-*):: 

Pro Glu Ala Arg His Pro Leu Val Ala Ala Tyr Pre He Val His Va3 

250 255 260 

60 

GAT ATG GAG AAT ATT ATT CTC TCA AAA AAT GAG GAT CAA TCC ACA CAG 2730 

Asp Met Glu Asn He He Leu Ser Lys Asn Glu Asp Gin Ser Thr Gin 

265 270 275 280 

65 AAT ACT GAT AGT GAA ACG AGA ACA ATA AGT AAA AAT ACT TCT ACA AGT ^Vfi 

Asn Thr Asp Ser Glu Thr Arg Thr He Ser Lys Asn Thr Ser Thr ?;er 
285 290 295 



AGG ACA CAT ACT AGT GAA GTA CAT GGA AAT GCA GAA GTG CAT GCG TCG 



2826 
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Arg Thr His Thr Ser Glu Val His Gly Asn Ala Glu Val His Ala Ser 
300 305 310 

TTC TTT GAT ATT GGT GGG AGT GTA TCT GCA GGA TTT. AGT AAT TCG AAT 2874 
5 Phe Phe Aep He Gly Gly Ser Val Ser Ala Gly Phe Ser Aen Ser Aen 
315 320 325 

TCA AGT ACG GTC GCA ATT GAT CAT TCA CTA TCT CTA GCA GGG GAA AGA 2922 
Ser Ser Thr Val Ala He Asp His Ser Leu Ser Leu Ala Gly Glu Arg 
10 330 335 340 

ACT TGG GCT GAA ACA ATG GGT TTA AAT ACC GCT GAT ACA GCA AGA TTA 2970 
Thr Trp Ala Glu Thr Met Gly Leu Asn Thr Ala Asp Thr Ala Arg Leu 
345 350 355 360 

15 

AAT GCC AAT ATT AGA TAT GTA AAT ACT GGG ACG GCT CCA ATC TAC AAC 3018 
Asn Ala Asn He Arg Tyr Val Asn Thr Gly Thr Ala Pro lie Tyr Asn 
365 370 375 

20 GTG TTA CCA ACG ACT TCG TTA GTG TTA GGA AAA AAT CAA ACA CTC GCG 3066 

Val Leu Pro Thr Thr Ser Leu Val Leu Gly Lys Asn Gin Thr Leu Ala 
380 385 390 

ACA ATT AAA GCT AAG GAA AAC CAA TTA AGT CAA ATA CTT GCA CCT AAT 3114 
25 Thr He Lys Ala Lys Glu Asn Gin Leu Ser Gin He Leu Ala Pro Asn 
395 400 405 

AAT TAT TAT CCT TCT AAA AAC TTG GCG CCA ATC GCA TTA AAT GCA CAA 3162 
Asn Tyr Tyr Pro Ser Lys Asn Leu Ala Pro He Ala Leu Asn Ala Gin 
30 410 415 420 

GAC GAT TTC AGT TCT ACT CCA ATT ACA ATG AAT TAC AAT CAA TTT CTT 3210 
Asp Asp Phe Ser Ser Thr Pro He Thr Met Asn Tyr Asn Gin Phe Leu 
425 430 435 440 



35 

GAG TTA GAA AAA ACG AAA CAA TTA AGA TTA GAT ACG GAT CAA GTA TAT 3258 
Glu Leu Glu Lys Thr Lys Gin Leu Arg Leu Asp Thr Asp Gin Val Tyr 
445 450 455 

40 GGG AAT ATA GCA ACA TAC AAT TTT GAA AAT GGA AGA GTG AGG GTG GAT 3306 

Gly Aen lie Ala Thr Tyr Asn Phe Glu Asn Gly Arg Val Arg Val Asp 
460 465 470 

ACA GGC TCG AAC TGG AGT GAA GTG TTA CCG CAA ATT CAA GAA ACA ACT 3354 
45 Thr Gly Ser Asn Trp Ser Glu Val Leu Pro Gin He Gin Glu Thr Thr 
475 480 485 

GCA CGT ATC ATT TTT AAT GGA AAA GAT TTA AAT CTG GTA GAA AGG CGG 3402 
Ala Arg He lie Phe Asn Gly Lys Asp Leu Asn Leu Val Glu Arg Arg 

50 490 495 500 

ATA GCG GCG GTT AAT CCT AGT GAT CCA TTA GAA ACG ACT AAA CCG GAT 34 50 

He Ala Ala Val Asn Pro Ser Asp Pro Leu Glu Thr Thx Lys Pro Asp 
505 510 515 520 

55 

ATG ACA TTA AAA GAA GCC CTT AAA ATA GCA TTT GGA TTT AAC GAA CCG 3498 

Met Thr Leu Lys Glu Ala Lev Lys lie Ala Phe Gly Phe Asn Glu Pre 
525 530 535 

60 AAT GGA AAC TTA CAA TAT CAA GGG AAA GAC ATA ACC GAA TTT GAT TTT 354 6 

Asn Gly Asn Leu Gin Tyr Gin Gly Lys Asp He Thr Glu Phe Asp Phe 
540 545 550 

AAT TTC GAT CAA CAA ACA TCT CAA AAT ATC AAG AAT CAG TTA GCG GAA 3594 
65 Asn Phe Asp Gin Gin Thr Ser Gin Asn lie Lys Asn Glr. Leu Ala Glu 
555 560 565 



TTA AAC GCA ACT AAC ATA TAT ACT GTA TTA GAT AAA ATC AAA TTA AAT 
Leu Asn Ala Thr Asn lie Tyr Thr Val Leu Asp Lys He Lys Leu Asn 



3642 



10 



30 
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ai 

570 575 580 

GCA AAA ATG AAT ATT TTA ATA AGA GAT AAA CGT TTT CAT TAT GAT AGA 3690 
Ala Lys Met Asn lie Leu lie Arg Asp Lys Arg Phe His Tyr Asp Arg 
585 590 595 600 

AAT AAC ATA GCA GTT GGG GCG GAT GAG TCA GTA GTT AAG GAG GCT CAT 3738 
Asn Asn lie Ala Val Gly Ala Asp Glu Ser Val Val Lys Glu Ala His 
605 610 615 

AGA GAA GTA ATT AAT TCG TCA ACA GAG GGA TTA TTG TTA AAT ATT GAT 3786 
Arg Glu Val He Asn Ser Ser Thr Glu Gly Leu Leu Leu Asn He Asp 
620 625 630 



15 AAG GAT ATA AGA AAA ATA TTA TCA GGT TAT ATT GTA GAA ATT GAA GAT 3834 

Lys Asp He Arg Lys He Leu Ser Gly Tyr He Val Glu He Glu Asp 
635 640 645 

ACT GAA GGG CTT AAA GAA GTT ATA AAT GAC AGA TAT GAT ATG TTG AAT 3882 
20 Thr Glu Gly Leu Lys Glu Val He Asn Asp Arg Tyr Asp Met Leu Asn 
650 655 660 

ATT TCT AGT TTA CGG CAA GAT GGA AAA ACA TTT ATA GAT TTT AAA AAA 3930 
He Ser Ser Leu Arg Gin Asp Gly Lys Thr Phe He Asp Phe Lys Lys 
25 665 670 675 680 

TAT AAT GAT AAA TTA CCG TTA TAT ATA AGT AAT CCC AAT TAT AAG GTA 3978 
Tyr Asn Asp Lys Leu Pro Leu Tyr He Ser Asn Pro Asn Tyr Lys Val 
685 630 695 



AAT GTA TAT GCT GTT ACT AAA GAA AAC ACT ATT ATT AAT CCT AGT GAG 4026 
Asn Val Tyr Ala Val Thr Lys Glu Asn Thr He He Asn Pro Ser Glu 
700 705 710 



35 AAT GGG GAT ACT AGT ACC AAC GGG ATC AAG AAA ATT TTA ATC TTT TCT 4074 

Asn Gly Asp Thr Ser Thr Asn Gly He Lys Lys He Leu He Phe Ser 
715 720 725 

AAA AAA GGC TAT GAG ATA GGA TAAGGTAATT CTAGGTGATT TTTAAATTAT 4125 
40 Lys Lys Gly Tyr Glu He Gly 
730 735 

CTAAAAAACA GTAAAATTAA AACATACTCT TTTTGTAAGA AATACAAGGA GAGTATGTTT 4185 

45 TAAACAGTAA TCTAAATCAT CATAATCCTT TGAGATTGTT TGTAGGATCC 4235 

(2) INFORMATION FOR SEQ ID MO: 4: 

5 0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 735 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

55 (ii) MOLECULE TYPE: protein 

(Ki) SEQUENCE DESCRIPTION SEQ ID NO 4 

Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
60 I 5 10 15 

Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 

20 25 30 

65 Met Val Val Thr Sei Ser Thr Thr Gly Asp Leu Ser lie Pro Ser Ser 
35 40 45 



Glu Leu Glu Asn lie Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala lie 
50 55 60 



» 
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Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
65 70 75 80 

Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
5 85 90 95 

He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 110 

10 Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 



15 



30 



45 



Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
130 135 140 

Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 

145 150 155 160 



Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val Pro 
20 165 170 175 

Asp Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu Val Glu Gly Tyr 
180 185 190 

25 Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp He Ser 
195 200 205 



Asn He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu 
210 215 220 

Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr 
225 230 235 240 



Gly Arg He Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu Val 
35 245 250 255 

Ala Ala Tyr Pro He Val His Val Asp Met Glu Asn He He Leu Ser 
260 265 270 

40 Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser Glu Thr Arg Thr 

275 280 285 



He Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His 
290 295 300 

Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He Gly Gly Ser Val 
305 310 315 320 



Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala He Asp His 

50 325 330 335 

Sex Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly Leu 
340 345 350 

5 5 Acn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He Arg r .ryr Val Asn 
355 360 365 

Tin Gly Tin Ala Pre He Ty.: Asn Val Leu Pro Thr Thi Ser Leu Val 
370 375 380 

60 

Leu Gly Lys Asn Gin Thr Leu Ala Thr lie Lys Ala Lys Glu Asn Gin 
385 390 395 400 

Leu Ser Gin He Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn Leu 
6E 40S 410 415 

Ala Pro He Ala Leu Asn Ala Gin Asp Asp Phe Ser Ser Thr Pro lie 
420 425 430 
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Thr Met Asn Tyr Asn Gin Phe Leu Glu Leu Glu Lys Thr Lys Gin Leu 
435 440 445 

Arg Leu Asp Thr Asp Gin Val Tyr Gly Asn He Ala Thr Tyr Asn Phe 
5 450 455 460 

Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
465 470 475 480 

10 Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 

485 490 495 



15 



30 



45 



65 



Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
500 505 510 

Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
515 520 525 



He Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
20 530 535 540 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 

545 550 555 560 

25 Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 

565 570 575 



Val Leu Asp Lys lie Lys Leu Asn Ala Lys Met Asn lie Leu He Arg 
580 585 590 

Asp Lys Arg Phe His Tyr Asp Arg Asn ABn He Ala Val Gly Ala Asp 
595 600 605 



Glu Ser val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 
35 610 615 620 

Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 
625 630 635 640 

40 Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 

645 650 655 



Asn Asp Arg Tyr Asp Met Leu Asn lie Ser Ser Leu Arg Gin Asp Gly 
660 665 670 

Lys Thr Phe He Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 
675 680 685 



He Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
50 690 695 700 

Asn Thr lie He Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
705 710 715 720 

55 lie Lys Lys He Leu He Phe Ser Lys Lys Gly Tyr Glu He Gly 

12S 730 735 

(2) INFORMATION FOR SEQ ID NO: 5: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1368 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL : NO 
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(iv) ANT I- SENSE : NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

5 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION; 1..1368 

(D) OTHER INFORMATION: /product^ 
10 "LFU-254) --TR--PE(401-602) " 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

15 GCG GGC GOT CAT GGT GAT GTA GGT ATG CAC GTA AAA GAG AAA GAG AAA 48 

Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

AAT AAA GAT GAG AAT AAG AGA AAA GAT GAA GAA CGA AAT AAA ACA CAG 96 
20 Asn Lys Asp Glu Asn Lys Arg Lye Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

GAA GAG CAT TTA AAG GAA ATC ATG AAA CAC ATT GTA AAA ATA GAA GTA 144 
Glu Glu His Leu Lys Glu He Met Lys His He Val Lys He Glu Val 
25 35 40 45 

AAA GGG GAG GAA GCT GTT AAA AAA GAG GCA GCA GAA AAG CTA CTT GAG 192 
Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 60 

30 

AAA GTA CCA TCT GAT GTT TTA GAG ATG TAT AAA GCA ATT GGA GGA AAG 240 
Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 
65 70 75 80 

35 ATA TAT ATT GTG GAT GGT GAT ATT ACA AAA CAT ATA TCT TTA GAA GCA 288 

He Tyr He Val Asp Gly Asp He Thr Lys Hie He Ser Leu Glu Ala 
85 90 95 

TTA TCT GAA GAT AAG AAA AAA ATA AAA GAC ATT TAT GGG AAA GAT GCT 33 6 

40 Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 110 

TTA TTA CAT GAA CAT TAT GTA TAT GCA AAA GAA GGA TAT GAA CCC GTA 384 
Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
45 115 120 125 

CTT GTA ATC CAA TCT TCG GAA GAT TAT GTA GAA AAT ACT GAA AAG GCA 432 
Leu Val lie Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
130 135 140 

50 

CTG AAC GTT TAT TAT GAA ATA GGT AAG ATA TTA TCA AGG GAT ATT TTA 480 

Leu Asn Val Tyr Tyr Gilu He Gly Lye He Leu Se:^ krg Asp He Leu 
1413 150 155 lbO 

55 
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AGT AAA ATT AAT CAA CCA TAT CAG AAA TTT TTA GAT GTA TTA AAT ACC 528 
Ser Lys lie Asn Gin Pro Tyx Gin Lys Phe Leu Asp Val Leu Asn Thr 
165 170 175 

5 ATT AAA AAT GCA TCT GAT TCA GAT GGA CAA GAT CTT TTA TTT ACT AAT 576 

lie Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 185 190 

CAG CTT AAG GAA CAT CCC ACA GAC TTT TCT GTA GAA TTC TTG GAA CAA 624 
10 Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
195 200 205 

AAT AGC AAT GAG GTA CAA GAA GTA TTT GCG AAA GCT TTT GCA TAT TAT 672 
Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr 
15 210 215 220 

ATC GAG CCA CAG CAT CGT GAT GTT TTA CAG CTT TAT GCA CCG GAA GCT 720 
He Glu Pro Gin Hie Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 



20 



GO 



TTT AAT TAC ATG GAT AAA TTT AAC GAA CAA GAA ATA AAT CTA CTC GGC 768 
Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Leu Gly 
245 250 255 



25 GAC GGC GGC GAC GTC AGC TTC AGC ACC CGC GGC ACG CAG AAC TGG ACG 816 

Asp Gly Gly Asp Val Ser Phe Ser Thr Arg Gly Thr Gin Asn Trp Thr 
260 265 270 

GTG GAG CGG CTG CTC CAG GCG CAC CGC CAA CTG GAG GAG CGC GGC TAT 864 

30 Val Glu Arg Leu Leu Gin Ala His Arg Gin Leu Glu Glu Arg Gly Tyr 
275 280 285 

GTG TTC GTC GGC TAC CAC GGC ACC TTC CTC GAA GCG GCG CAA AGC ATC 912 

Val Phe Val Gly Tyr Hie Gly Thr Phe Leu Glu Ala Ala Gin Ser He 
35 290 295 300 

GTC TTC GGC GGG GTG CGC GCG CGC AGC CAG GAC CTC GAC GCG ATC TGG 960 

Val Phe Gly Gly Val Arg Ala Arg Ser Gin Asp Leu Asp Ala He Trp 
305 310 315 320 

40 

CGC GGT TTC TAT ATC GCC GGC GAT CCG GCG CTG GCC TAC GGC TAC GCC 1008 

Arg Gly Phe Tyr He Ala Gly Asp Pro Ala Leu Ala Tyr Gly Tyr Ala 

325 330 335 

45 CAG GAC CAG GAA CCC GAC GCA CGC GGC CGG ATC CGC AAC GGT GCC CTG 1056 

Gin Asp Gin Glu Pro Asp Ala Arg Gly Arg He Arg Asn Gly Ala Leu 
340 345 350 

CTG CGG GTC TAT GTG CCG CGC TCG AGC CTG CCG GGC TTC TAC CGC ACC 1104 

50 Leu Arg Val Tyr Val Pro Arg Ser Ser Leu Pro Gly Phe Tyr Arg Thr 
355 360 365 

AGC CTG ACC CTG GCC GCG CCG GAG GCG GCG GGC GAG GTC GAA CGG CTG 115:. 

Ser Lou Thr Leu Ala Ala Pro Glu Ala Ala Gly Glu Val Glu Arg Leu 
55 370 375 380 

ATC GGC CAT CCG CTG CCG CTG CGC CTG GAC GCC AT" ACC GGC CCC GAG 12 0( 

He Gly His Pro Leu Pro Leu Arg Leu Asp Ala He Thr Gly Pro Glu 
385 390 395 400 



GAG GAA GGC GGG CGC CTG GAG ACC ATT CTC GGC TGG CCG CTG GCC GAG 1248 
Glu Glu Gly Gly Arg Leu Glu Thr He Leu Gly Trp Pro Leu Ala Glu 
405 410 415 



65 CGC ACC GTG GTG ATT CCC TCG GCG ATC CCC ACC GAC CCG CGC AAC GTC 129 C 

Arg Thr Val Val He Pro Ser Ala He Pro Thr Asp Pro Arg Asn Val 
420 425 430 

GGC GGC GAC CTC GAC CCG TCC AGC ATC CCC GAC AAG GAA CAG GCG ATC 1344 



10 



15 
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Gly Gly Asp Leu Asp Pro Ser Ser lie Pro Asp Lys Glu Gin Ala lie 
435 440 445 

AGC GCC CTG CCG GAC TAC GCC AGC 1368 
Ser Ala Leu Pro Asp Tyr Ala Ser 
450 455 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 456 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 



20 Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
1 5 10 15 



25 



40 



55 



Asn Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

Glu Glu His Leu Lys Glu He Met Lys His He Val Lys He Glu Val 
35 40 45 



Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 

30 50 55 60 

Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He Gly Gly Lys 

65 70 75 80 

35 lie Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 

85 90 95 



Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 110 

Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
115 120 125 



Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
45 130 135 140 

Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg Asp He Leu 

145 150 155 160 

5 0 Ser Lys lie Asn Gin Pro Tyr Gin Lys Phe Leu Asp Va"! Leu Asn Thr 

165 170 175 



He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 

180 185 ' 190 

Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 

19F 200 20 ,: . 



Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tvr Tyr 
60 210 215 22C 

He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 

65 Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu lie Asn Leu Lsu Gl / 

245 250 255 

Asp Gly Gly Asp Val Ser Phe Ser Thr Arg Gly Thr Gin Asn Trp Thr 
260 265 270 
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Val Glu Arg Leu Leu Gin Ala His Arg Gin Leu Glu Glu Arg Gly Tyr 
275 280 285 

Val Phe Val Gly Tyr His Gly Thr Phe Leu Glu Ala Ala Gin Ser lie 
5 290 295 300 

Val Phe Gly Gly Val Arg Ala Arg Ser Gin Asp Leu Asp Ala lie Trp 
305 310 315 320 

10 Arg Gly Phe Tyr He Ala Gly Asp Pro Ala Leu Ala Tyr Gly Tyr Ala 

325 330 335 



15 



30 



60 



Gin Asp Gin Glu Pro Asp Ala Arg Gly Arg He Arg Asn Gly Ala Leu 
340 345 350 

Leu Arg Val Tyr Val Pro Arg Ser Ser Leu Pro Gly Phe Tyr Arg Thr 
355 360 365 



Ser Leu Thr Leu Ala Ala Pro Glu Ala Ala Gly Glu Val Glu Arg Leu 
20 370 375 380 

He Gly His Pro Leu Pro Leu Arg Leu Asp Ala He Thr Gly Pro Glu 
385 390 395 400 

25 Glu Glu Gly Gly Arg Leu Glu Thr lie Leu Gly Trp Pro Leu Ala Glu 

405 410 415 



Arg Thr Val Val lie Pro Ser Ala He Pro Thr Asp Pro Arg Asn Val 
420 425 430 

Gly Gly Asp Leu Asp Pro Ser Ser He Pro Asp Lys Glu Gin Ala He 
435 440 445 



Ser Ala Leu Pro Asp Tyr Ala Ser 
35 450 455 

(2) INFORMATION FOR SEQ ID NO: 7; 

<i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 1425 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

45 (ii) MOLECULE TYPE : DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 
5 0 (A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1..1416 

55 (D) OTHER INFORMATION: /product = 

w LF(l-254) --TR--PE (398-613^ " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

ATG GTA CCA GCG GGC GGT CAT GGT GAT GTA GGT ATG CAC GTA AAA GAG 4 8 

Met Val Pro Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu 
IS 10 15 

65 AAA GAG AAA AAT AAA GAT GAG AAT AAG AGA AAA GAT GAA GAA CGA AAI\ 96 

Lys Glu Lys Asn LyB Asp Glu Asn Lys Arg Lys Asp Gla Glu Arg Asn 
20 25 30 

AAA ACA CAG GAA GAG CAT TTA AAG GAA ATC ATG AAA CAC ATT GTA AAA 144 
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Lys Thr Gin Glu Glu His Leu Lye Glu He Met Lys His He Val Lys 
35 40 45 

ATA GAA GTA AAA GGG GAG GAA GCT GTT AAA AAA GAG GCA GCA GAA AAG 192 

5 He Glu Val Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys 
50 55 60 

CTA CTT GAG AAA GTA CCA TCT GAT GTT TTA GAG ATG TAT AAA GCA ATT 240 

Leu Leu Glu Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He 
10 65 70 75 80 



15 



GGA GGA AAG ATA TAT ATT GTG GAT GGT GAT ATT ACA AAA CAT ATA TCT 
Gly Gly Lys He Tyr lie Val Asp Gly Asp He Thr Lys His He Ser 
85 90 95 



288 
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TTA GAA GCA TTA TCT GAA GAT AAG AAA AAA ATA AAA GAC ATT TAT GGG 
Leu Glu Ala Leu Ser Glu Asp Lys Lys Lys lie Lys Asp He Tyr Gly 
100 105 HO 

AAA GAT GCT TTA TTA CAT GAA CAT TAT GTA TAT GCA AAA GAA GGA TAT 
Lys Asp Ala Leu Leu His Glu Hie Tyr Val Tyr Ala Lys Glu Gly Tyr 
115 120 125 

GAA CCC GTA CTT GTA ATC CAA TCT TCG GAA GAT TAT GTA GAA AAT ACT 
Glu Pro Val Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr 
130 135 140 

GAA AAG GCA CTG AAC GTT TAT TAT GAA ATA GGT AAG ATA TTA TCA AGG 
Glu Lys Ala Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg 
5 145 150 155 160 

GAT ATT TTA AGT AAA ATT AAT CAA CCA TAT CAG AAA TTT TTA GAT GTA 
Asp He Leu Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val 
165 170 175 



0 



0 



TTA AAT ACC ATT AAA AAT GCA TCT GAT TCA GAT GGA CAA GAT CTT TTA 
Leu Asn Thr He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu 
180 185 190 



5 TTT ACT AAT CAG CTT AAG GAA CAT CCC ACA GAC TTT TCT GTA GAA TTC 

Phe Thr Asn Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe 
195 200 205 

TTG GAA CAA AAT AGC AAT GAG GTA CAA GAA GTA TTT GCG AAA GCT TTT 
0 Leu Glu Gin Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe 
210 215 220 

GCA TAT TAT ATC GAG CCA CAG CAT CGT GAT GTT TTA CAG CTT TAT GCA 
Ala Tyr Tyr He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala 
5 225 230 235 240 

CCG GAA GCT TTT AAT TAC ATG GAT AAA TTT AAC GAA CAA GAA ATA AAT 
Pro Glu Ala Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn 
245 250 255 



CTA ACG CGT GCG GAG TTC CTC GGC GAC GGC GGC GAC GTC AGC TTC AGC 
Leu Thr Arg Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe Ser 
260 265 270 



5 ACC CGC GGC ACG CAG AAC TGG ACG GTG GAG CGG CTG CTC CAG GCG CAC 

Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala His 
275 280 2B5 

CGC CAA CTG GAG GAG CGC GGC TAT GTG TTC GTC GGC TAC CAC GGC ACC 
0 Arg Gin Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr His Gly Tlir 

290 295 3D0 

TTC CTC GAA GCG GCG CAA AGC ATC C-TC TTC GGC GGG GTG CGC GCG C3C 
Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala Arg 
5 305 310 315 320 

AGC CAG GAC CTC GAC GCG ATC TGG CGC GGT TTC TAT ATC: GCC GGC GA.l 
Ser Gin Asp Leu Asp Ala He Trp Arg Gly Phe Tyr lie Ala Gly Asp 
325 330 335 



CCG GCG CTG GCC TAC GGC TAC GCC CAG GAC CAG GAA CCC GAC GCA CGC 

Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pro Asp Ala Aig 
340 345 350 

GGC CGG ATC CGC AAC GGT GCC CTG CTG CGG GTC TAT GTG CCG CGC TCG 

Giv Arg lie Arg Aen Gly Ala Leu Leu Arg Val Tyr Val Pro Arg Ser 
355 " 360 365 

AGC CTG CCG GGC TTC TAC CGC ACC AGC CTG ACC CTG GCC GCG CCG GAG 
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Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro Glu 
370 375 380 

GCG GCG GGC GAG GTC GAA CGG CTG ATC GGC CAT CCG CTG CCG CTG CGC 1200 
5 Ala Ala Gly Glu Val Glu Arg Leu He Gly His Pro Leu Pro Leu Arg 
385 390 395 400 

CTG GAC GCC ATC ACC GGC CCC GAG GAG GAA GGC GGG CGC CTG GAG ACC 1248 
Leu Asp Ala He Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu Thr 
10 405 410 415 

ATT CTC GGC TGG CCG CTG GCC GAG CGC ACC GTG GTG ATT CCC TCG GCG 129 6 

He Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val He Pro Ser Ala 
420 425 430 

15 

ATC CCC ACC GAC CCG CGC AAC GTC GGC GGC GAC CTC GAC CCG TCC AGC 1344 

lie Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Pro Ser Ser 
435 440 445 

20 ATC CCC GAC AAG GAA CAG GCG ATC AGC GCC CTG CCG GAC TAC GCC AGC 1392 

He Pro Asp Lys Glu Gin Ala He Ser Ala Leu Pro Asp Tyr Ala Ser 
450 455 460 

CAG CCC GGC AAA CCG CCG CGC GAG GACCTGAAG 1425 
25 Gin Pro Gly Lys Pro Pro Arg Glu 
465 470 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 472 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:B: 



40 Met Val Pro Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu 
15 10 15 
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Lvs Glu Lys Asn Lys Asp Glu Aen Lys Arg Lys Asp Glu Glu Arg Asn 
20 25 30 

Lys Thr Gin Glu Glu His Leu Lys Glu lie Met Lys His lie Val Lys 
5 35 40 45 

lie Glu Val Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys 
50 B5 60 

10 Leu Leu Glu Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala He 
65 70 75 80 

Glv Glv Lys He Tyr He Val Asp Gly Asp He Thr Lys His He Ser 
65 90 95 

15 

Leu Glu Ala Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly 
100 105 110 

Lys Asp Ala Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr 
20 115 120 125 

Glu Pro Val Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr 
130 135 140 

25 Glu Lys Ala Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg 
145 150 155 160 

Asp He Leu Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val 
165 170 175 

30 

Leu Aen Thr He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu 
180 185 190 

Phe Thr Asn Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe 
35 195 200 205 

Leu Glu Gin Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe 
210 215 220 

4 0 Ala Tyr Tyr He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala 
225 230 235 240 

Pro Glu Ala Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn 
245 250 255 

45 

Leu Thr Arg Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe Ser 
260 265 270 

Thr Ara Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala His 

50 " 275 280 285 

Arg Glr- Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr His Gly Thr 
29T 295 300 

55 Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala Arg 

305 310 315 320 

Ser Gl:: Asp I.e.) Asp Ala Trp Arg Gly Phe Tyr ilk A„a Gly Asp 

325 330 335 

6 0 Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pre Asp Ala Arg 

340 345 350 

Gly Arg He Arg Asn Gly Ala Lsu Leu Arg Val Tyr Val Pro Arg Ser 
355 ' 360 365 

65 

Ser Leu Pro Gly Phe Tyr Arg Thi Sei Leu Thr Leu Ala A..a Pro Glu 
370 375 380 

Ala Ala Gly Glu Val Glu Arg Leu He Gly His Pro Leu Pro Leu Arg 
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385 390 395 400 

Leu Asp Ala He Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu Thr 
405 410 415 

5 

He Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val He Pro Ser Ala 
420 425 430 

He Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Pro Ser Ser 
10 435 440 445 

lie Pro Asp Lys Glu Gin Ala He Ser Ala Leu Pro Asp Tyr Ala Ser 
450 455 460 

15 Gin Pro Gly Lys Pro Pro Arg Glu 
465 470 

(2) INFORMATION FOR SEQ ID NO: 9: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1524 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

30 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME /KEY : CDS 
35 (B) LOCATION: 1..1524 

(D) OTHER INFORMATION : /products 

»LF(l-254) --TR— PE<3G2~613) " 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9; 
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GCG GGC GGT CAT GGT GAT GTA GGT ATG CAC GTA AAA GAG AAA GAG AAA 48 
Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lys 
15 10 15 

5 AAT AAA GAT GAG AAT AAG AGA AAA GAT GAA GAA CGA AAT AAA ACA CAG 96 

Aen Lys Asp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

GAA GAG CAT TTA AAG GAA ATC ATG AAA CAC ATT GTA AAA ATA GAA GTA 144 
10 Glu Glu His Leu Lys Glu lie Met Lys His lie val Lys lie Glu Val 
35 40 45 

AAA GGG GAG GAA GCT GTT AAA AAA GAG GCA GCA GAA AAG CTA CTT GAG 192 
Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
15 50 55 60 

AAA GTA CCA TCT GAT GTT TTA GAG ATG TAT AAA GCA ATT GGA GGA AAG 240 
Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala lie Gly Gly Lys 
65 70 75 80 



20 



40 



ATA TAT ATT GTG GAT GGT GAT ATT ACA AAA CAT ATA TCT TTA GAA GCA 288 
lie Tyr He Val Asp Gly Asp He Thr Lys His He Ser Leu Glu Ala 
85 90 95 



25 TTA TCT GAA GAT AAG AAA AAA ATA AAA GAC ATT TAT GGG AAA GAT GCT 336 

Leu Ser Glu Asp Lys Lys Lys He Lys Asp He Tyr Gly Lys Asp Ala 
100 105 110 

TTA TTA CAT GAA CAT TAT GTA TAT GCA AAA GAA GGA TAT GAA CCC GTA 384 
30 Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
115 120 125 

CTT GTA ATC CAA TCT TCG GAA GAT TAT GTA GAA AAT ACT GAA AAG GCA 432 
Leu Val He Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
35 130 135 140 

CTG AAC GTT TAT TAT GAA ATA GGT AAG ATA TTA TCA AGG GAT ATT TTA 480 
Leu Asn Val Tyr Tyr Glu He Gly Lys He Leu Ser Arg Asp He Leu 
145 150 155 160 



AGT AAA ATT AAT CAA CCA TAT CAG AAA TTT TTA GAT GTA TTA AAT ACC 52 8 

Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr 
165 170 175 



45 ATT AAA AAT GCA TCT GAT TCA GAT GGA CAA GAT CTT TTA TTT ACT AAT 57 6 

lie Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 185 190 

CAG CTT AAG GAA CAT CCC ACA GAC TTT TCT GTA GAA TTC TTG GAA CAA 624 
50 Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu phe Leu Glu Gin 

195 200 205 

AAT AGC AAT GAG GTA CAA GAA GTA TTT GCG AAA GCT TTT GCA TAT TAT 672 
Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Pha Ala Tyr Tyr 
55 210 215 220 

ATC GAG CCA CAG CAT CGT GAT GTT TTA CAG CTT TAT GCA CCG GAA GCT 72 0 

He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 

60 

TTT AAT TAC ATG GAT AAA TTT AAC GAA CAA GAA ATA AAT CTA ACG CGT 76ft 
Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Thr Arg 
245 250 255 

65 GCG GCC AAC GCC GAC GTG GTG AGC CTG ACC TGC CCG GTC GCC CCC GGT 816 

Ala Ala Asn Ala Asp Val Val Ser Leu Thr Cys Pro Val Ala Ala Gly 
260 265 270 

GAA TGC GCG GGC CCG GCG GAC AGC GGC GAC GCC CTG CTG GAG CGC AAC 864 
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Glu Cys Ala Gly Pro Ala Asp Ser Gly Asp Ala Leu Leu Glu Arg Asn 
275 280 285 

TAT CCC ACT GGC GCG GAG TTC CTC GGC GAC GGC GGC GAC GTC AGC TTC 912 
5 Tyr Pro Thr Gly Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe 
290 295 300 

AGC ACC CGC GGC ACG CAG AAC TGG ACG GTG GAG CGG CTG CTC CAG GCG 960 
Ser Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala 
10 305 310 315 320 

CAC CGC CAA CTG GAG GAG CGC GGC TAT GTG TTC GTC GGC TAC CAC GGC 1008 
His Arg Gin Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr His Gly 
325 330 335 



15 



35 



55 



60 



ACC TTC CTC GAA GCG GCG CAA AGC ATC GTC TTC GGC GGG GTG CGC GCG 1056 
Thr Phe Leu Glu Ala Ala Gin Ser lie Val Phe Gly Gly Val Arg Ala 
340 345 350 



20 CGC AGC CAG GAC CTC GAC GCG ATC TGG CGC GGT TTC TAT ATC GCC GGC 1104 

Arg Ser Gin Asp Leu Asp Ala lie Trp Arg Gly Phe Tyr lie Ala Gly 

355 360 365 

GAT CCG GCG CTG GCC TAC GGC TAC GCC CAG GAC CAG GAA CCC GAC GCA 1152 

25 Asp Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pro Asp Ala 
370 375 380 

CGC GGC CGG ATC CGC AAC GGT GCC CTG CTG CGG GTC TAT GTG CCG CGC 1200 

Arg Gly Arg lie Arg Asn Gly Ala Leu Leu Arg Val Tyr Val Pro Arg 
30 385 390 395 400 

TCG AGC CTG CCG GGC TTC TAC CGC ACC AGC CTG ACC CTG GCC GCG CCG 1248 

Ser Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro 
405 410 415 



GAG GCG GCG GGC GAG GTC GAA CGG CTG ATC GGC CAT CCG CTG CCG CTG 1296 
Glu Ala Ala Gly Glu Val Glu Arg Leu lie Gly His Pro Leu Pro Leu 
420 425 430 



40 CGC CTG GAC GCC ATC ACC GGC CCC GAG GAG GAA GGC GGG CGC CTG GAG 1344 

Arg Leu Asp Ala lie Thr Gly Pro Glu Glu Glu Gly Gly Arg Leu Glu 
435 440 445 

ACC ATT CTC GGC TGG CCG CTG GCC GAG CGC ACC GTG GTG ATT CCC TCG 1392 
45 Thr lie Leu Gly Trp Pro Leu Ala Glu Arg Thr Val Val He Pro Ser 
450 455 460 

GCG ATC CCC ACC GAC CCG CGC AAC GTC GGC GGC GAC CTC GAC CCG TCC 1440 
Ala lie Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Pro Ser 
50 465 470 475 480 

AGC ATC CCC GAC AAG GAA CAG GCG ATC AGC GCC CTG CCG GAC TAC GCC 14 88 

Ser He Pro Asp Lys Glu Gin Ala He Ser Ala Leu Pro Asp Tyr Ala 
485 490 495 



AGC CAG CCC GGC AAA CCG CCG CGC GAG GAC CTG AAG 1524 
Ser Gin Pro Gly Lys Pro Pro Arg Glu Asp Leu Lys 
500 505 

(2) INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 508 amino acids 
65 <B) TYPE: amino acid 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: 



protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 10: 

Ala Gly Gly His Gly Asp Val Gly Met His Val Lys Glu Lys Glu Lyt, 
I 5 10 15 

5 

Asn Lys ABp Glu Asn Lys Arg Lys Asp Glu Glu Arg Asn Lys Thr Gin 
20 25 30 

Glu Glu His Leu Lys Glu lie Met Lys His lie Val Lys lie Glu Val 
10 35 40 45 

Lys Gly Glu Glu Ala Val Lys Lys Glu Ala Ala Glu Lys Leu Leu Glu 
50 55 60 

15 Lys Val Pro Ser Asp Val Leu Glu Met Tyr Lys Ala lie Gly Gly Lys 
65 70 75 80 



20 



lie Tyr lie Val Asp Gly Asp lie Thr Lys His lie Ser Leu Glu Ala 
85 90 95 

Leu Ser Glu Asp Lys Lys Lys lie Lys Asp lie Tyr Gly Lys Asp Ala 
100 105 110 



Leu Leu His Glu His Tyr Val Tyr Ala Lys Glu Gly Tyr Glu Pro Val 
25 115 120 125 



Leu Val lie Gin Ser Ser Glu Asp Tyr Val Glu Asn Thr Glu Lys Ala 
130 135 140 



30 
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Leu Asn Val Tyr Tyr Glu lie Gly Lys He Leu Ser Arg Asp lie Leu 
145 150 155 160 

Ser Lys He Asn Gin Pro Tyr Gin Lys Phe Leu Asp Val Leu Asn Thr 
5 165 170 175 

He Lys Asn Ala Ser Asp Ser Asp Gly Gin Asp Leu Leu Phe Thr Asn 
180 135 190 

10 Gin Leu Lys Glu His Pro Thr Asp Phe Ser Val Glu Phe Leu Glu Gin 
195 200 205 

Asn Ser Asn Glu Val Gin Glu Val Phe Ala Lys Ala Phe Ala Tyr Tyr 
210 215 220 

15 

He Glu Pro Gin His Arg Asp Val Leu Gin Leu Tyr Ala Pro Glu Ala 
225 230 235 240 

Phe Asn Tyr Met Asp Lys Phe Asn Glu Gin Glu He Asn Leu Thr Arg 
20 245 250 255 

Ala Ala Asn Ala Asp Val Val Ser Leu Thr Cys Pro Val Ala Ala Gly 
260 265 270 

25 Glu Cys Ala Gly Pro Ala Asp Ser Gly Asp Ala Leu Leu Glu Arg Asn 
275 280 285 

Tyr Pro Thr Gly Ala Glu Phe Leu Gly Asp Gly Gly Asp Val Ser Phe 
290 295 300 

30 

Ser Thr Arg Gly Thr Gin Asn Trp Thr Val Glu Arg Leu Leu Gin Ala 
305 310 315 320 

His Arg Gin Leu Glu Glu Arg Gly Tyr Val Phe Val Gly Tyr Hie Gly 
35 325 330 335 

Thr Phe Leu Glu Ala Ala Gin Ser He Val Phe Gly Gly Val Arg Ala 
340 345 350 

40 Arg Ser Gin Asp Leu Asp Ala He Trp Arg Gly Phe Tyr lie Ala Gly 
355 360 365 

Asp Pro Ala Leu Ala Tyr Gly Tyr Ala Gin Asp Gin Glu Pro Asp Ala 
370 375 380 

45 

Arg Gly Arg He Arg Asn Gly Ala Leu Leu Arg Val Tyr Val Pro Arg 
385 390 395 400 

Ser Ser Leu Pro Gly Phe Tyr Arg Thr Ser Leu Thr Leu Ala Ala Pro 
50 405 410 415 

Glu Ala Ala Gly Glu Val Glu Arg Leu He Gly His Pro Leu Pro Leu 
420 " 425 4jG 

55 Arg Leu Asp Ala He Thr Gly Pro Glu Giu Giu Gly Gly Arg Leu Glu 

435 440 445 

Thr He Leu Gly Trp Pro Leu Ala Glu Arg Tar Val. Val i:.<? Prr> Sev 

4L>0 * 4^5 460 

60 Ala He Pro Thr Asp Pro Arg Asn Val Gly Gly Asp Leu Asp Prj Ser 

465 470 4-75 480 

Ser lie Pro Asp Lys Glu Gin Ala lie Ser Ala Leu Pro Asp Tyr Ala 
485 490 495 

65 

Ser Gin Pro Gly Lys Pro Pro Arg Glu A«p Leu Lys 
500 505 



(2) INFORMATION FOR SEQ ID NO: 11: 
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ACG GTT GAT GTC AAA AAT AAA AGA ACT TTT CTT TCA CCA TGG ATT TCT 624 
Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp lie Ser 
195 200 205 

5 AAT ATT CAT GAA AAG AAA GGA TTA ACC AAA TAT AAA TCA TCT CCT GAA 672 

Asn lie His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu 
210 215 220 

AAA TGG AGC ACG GCT TCT GAT CCG TAC AGT GAT TTC GAA AAG GTT ACA . 720 

10 Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr 
225 230 235 240 

GGA CGG ATT GAT AAG AAT GTA TCA CCA GAG GCA AGA CAC CCC CTT GTG 768 
Gly Arg lie Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu Val 
15 245 250 255 

GCA GCT TAT CCG ATT GTA CAT GTA GAT ATG GAG AAT ATT ATT CTC TCA 816 
Ala Ala Tyr Pro lie Val His Val Asp Met Glu Asn lie He Leu Ser 
260 265 270 



20 



60 



AAA AAT GAG GAT CAR TCC ACA CAG AAT ACT GAT AGT GAA ACG AGA ACA 864 
Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser Glu Thr Arg Thr 
275 280 285 



25 ATA AGT AAA AAT ACT TCT ACA AGT AGG ACA CAT ACT AGT GAA GTA CAT 912 

He Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His 
290 295 300 

GGA AAT GCA GAA GTG CAT GCG TCG TTC TTT GAT ATT GGT GGG AGT GTA 960 
30 Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He Gly Gly Ser Val 
305 310 315 320 

TCT GCA GGA TTT AGT AAT TCG AAT TCA AGT ACG GTC GCA ATT GAT CAT 1008 
Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val Ala He Asp His 
35 325 330 335 

TCA CTA TCT CTA GCA GGG GAA AGA ACT TGG GCT GAA ACA ATG GGT TTA 1056 
Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly Leu 
340 345 350 

40 

AAT ACC GCT GAT ACA GCA AGA TTA AAT GCC AAT ATT AGA TAT GTA AAT 1104 
Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He Arg Tyr Val Asn 
355 360 365 

45 ACT GGG ACG GCT CCA ATC TAC AAC GTG TTA CCA ACG ACT TCG TTA GTG 1152 

Thr Gly Thr Ala Pro lie Tyr Asn Val Leu Pro Thr Thr Ser Leu Val 
370 375 380 

TTA GGA AAA AAT CAA ACA CTC GCG ACA ATT AAA GCT AAG GAA AAC CAA 1200 
50 Leu 31y Lys Asn Gin Thr Leu Ala Thr He Lys Ala Lys Glu Asn Gin 

385 390 395 400 

TTA AGT CAA ATA CTT GCA CCT AAT AAT TAT TAT CCT TCT AAA AAC TTG 1248 
Leu Ser Gin He Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn Leu 
55 405 410 415 

GC'i "CA ATC GCA TTA AAT GCA CAA GAC GAT TTC AGT TCT ACT CCA ATT 12^6 
Aia Pro He Ala x.eu Asn Ala Gin Asp Asp Phe Ser Ser Thr Px~o He 
420 425 430 



ACA ATG AAT TAC AAT CAA TTT GTT GAG TTA GAA AAA ACG AAA CAA TTA 1344 
Thr Met Asn Tyr Asn Gin Phe Leu Glu Leu Glu Lys Thr Lys Gin Leu 
435 440 445 



t5 AGA 'ITA GAT ACG GAT CAA GTA TAT GGG AAT ATA GCA ACA TAC AAT TTT 133 2 

Arq Leu Asp Thr Aso Gin Val Tyr Gly Asn lie Ala "Thr Tyr Asn Phe 
450 455 460 



GAA AAT GGA AGA GTG AGG GTG GAT ACA GGC TCG AAC TGG AGT GAA GTG 



1440 
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Glu Asn Gly Arg Val Arg val Asp Thr Gly Ser Asn Trp Ser Glu Val 
465 470 475 480 

TTA CCG CAA ATT CAA GAA ACA ACT GCA CGT ATC ATT TTT AAT GGA AAA 1488 
5 Leu Pro Gin lie Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 

485 490 495 

GAT TTA AAT CTG GTA GAA AGG CGG ATA GCG GCG GTT AAT CCT AGT GAT 1536 
Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
10 500 505 510 

CCA TTA GAA ACG ACT AAA CCG GAT ATG ACA TTA AAA GAA GCC CTT AAA 1584 
Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
515 520 525 



15 



35 



ATA GCA TTT GGA TTT AAC GAA CCG AAT GGA AAC TTA CAA TAT CAA GGG 1632 
He Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
530 535 540 



20 AAA GAC ATA ACC GAA TTT GAT TTT AAT TTC GAT CAA CAA ACA TCT CAA 1680 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
545 550 555 560 

AAT ATC AAG AAT CAG TTA GCG GAA TTA AAC GCA ACT AAC ATA TAT ACT 1728 
25 Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 

565 570 575 

GTA TTA GAT AAA ATC AAA TTA AAT GCA AAA ATG AAT ATT TTA ATA AGA 1776 
Val Leu Asp Lys He Lys Leu Asn Ala Lys Met Asn He Leu lie Arg 
30 580 585 590 

GAT AAA CGT TTT CAT TAT GAT AGA AAT AAC ATA GCA GTT GGG GCG GAT 1824 
Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
595 600 605 



GAG TCA GTA GTT AAG GAG GCT CAT AGA GAA GTA ATT AAT TCG TCA ACA 1872 
Glu Ser Val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 
610 615 620 



40 GAG GGA TTA TTG TTA AAT ATT GAT AAG GAT ATA AGA AAA ATA TTA TCA 1920 

Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys lie Leu Ser 
625 630 635 640 

GGT TAT ATT GTA GAA ATT GAA GAT ACT GAA GGG CTT AAA GAA GTT ATA 1968 
45 Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 

645 650 655 

AAT GAC AGA TAT GAT ATG TTG AAT ATT TCT AGT TTA CGG CAA GAT GGA 2016 
Asn Asp Arg Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
50 " 660 665 670 

AAA ACA TTT ATA GAT TTT AAA AAA TAT AAT GAT AAA TTA CCG TTA TAT 2 064 

Lvs Thr Phe He Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 

675 6B0 68b 

55 

ATA AGT AAT CCC AAT TAT AAG GTA AAT GTA TAT GCT GTT ACT AAA GAA 21 L 2 

I'.e Sst: Asn P"c Asn Tyr lys Va?. Asn Val Tyr Ala Va!. Thr ays Glu. 

690 695 700 

60 AAC ACT ATT ATT AAT CCT AGT GAG AAT GGG GAT ACT AGT ACC AAC GGG 216 0 

Asn Thr lie lie Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
705 710 715 720 

ATC AAG AAA ATT TTA AAG AAA GTG GTG CTG GGC AAA A^vA GGG GAT ACA 2 20 8 

65 He Lys Lys lie Leu Lys Lys Val Val Leu Gly Lys Lys Gly Asp Thr 

725 730 735 

GTG GAA CTG ACC TGT ACA GCT TCC CAG AAG AAG AGC ATA CAA TTC CAC 2256 
Val Glu Leu Thr Cys Thr Ala Ser Gin Lys Lys Ser lie Gin Phe His 



10 
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740 745 750 

TGG AAA AAC TCC AAC CAG ATA AAG AIT CTG GGA AAT CAG GGC TCC TTC 2304 
Trp Lys Asn Ser Asn Gin lie Lys He Leu Gly Asn Gin Gly Ser Phe 
755 760 765 

TTA ACT AAA GGT CCA TCC AAG CTG AAT GAT CGC GCT GAC TCA AGA AGA 2352 
Leu Thr Lye Gly Pro Ser Lys Leu Asn Asp Arg Ala Asp Ser Arg Arg 
770 775 780 

AGC CTT TGG GAC CAA GGA AAC TTC CCC CTG ATC ATC AAG AAT CTT AAG 2400 
Ser Leu Trp Asp Gin Gly Asn Phe Pro Leu lie He Lys Asn Leu Lys 
785 790 795 800 



15 ATA GAA GAC TCA GAT ACT TAC ATC TGT GAA GTG GAG GAC CAG AAG GAG 2448 

He Glu Asp Ser Asp Thr Tyr He Cys Glu Val Glu Asp Gin Lys Glu 
805 810 815 

GAG GTG CAA TTG CTA GTG TTC GGA TTG ACT GCC AAC TCT GAC ACC CAC 2496 
20 Glu Val Gin Leu Leu Val Phe Gly Leu Thr Ala Asn Ser Asp Thr His 
820 825 830 

CTG CTT CAG GGG CAG AGC CTG ACC CTG ACC TTG GAG AGC CCC CCT GGT 2544 
Leu Leu Gin Gly Gin Ser Leu Thr Leu Thr Leu Glu Ser Pro Pro Gly 
25 835 840 845 

AGT AGC CCC TCA GTG CAA TGT AGG AGT CCA AGG GGT AAA AAC ATA CAG 2592 

Ser Ser Pro Ser Val Gin Cys Arg Ser Pro Arg Gly Lys Asn He Gin 
850 855 860 

30 

GGG GGG AAG ACC CTC TCC GTG TCT CAG CTG GAG CTC CAG GAT AGT GGC 2 640 

Gly Gly Lye Thr Leu Ser Val Ser Gin Leu Glu Leu Gin Asp Ser Gly 
865 870 875 880 

3 5 ACC TGG ACA TGC ACT GTC TTG CAG AAC CAG AAG AAG GTG GAG TTC AAA 2 688 

Thr Trp Thr Cys Thr Val Leu Gin Asn Gin Lys Lys Val Glu Phe Lys 
885 890 895 

ATA GAC ATC GTG GTG CTA GCT 27 09 

40 He Asp He Val Val Leu Ala 
900 



(2) INFORMATION FOR SEQ ID NO: 12: 

45 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 903 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY; linear 

50 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCK DESCRIPTION: SEQ ID NO: 12 



55 



Glu Val Lis Gin GIj Asn Arg Leu Leu Asn Glu Ser Glu Ser 
1 5 10 



Ser 
lb 



Ser 
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Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 25 30 

Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser lie Pro Ser Ser 
5 35 40 45 

Glu Leu Glu Asn lie Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
50 55 60 

10 Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
65 70 75 80 



15 



30 



45 



65 



Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 
85 90 95 

He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 HO 



Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
20 115 120 125 

Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
130 135 140 

25 Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 
145 150 155 160 



Ser Asn Ser Arg Lys Lys Arg Ser Thr Ser Ala Gly Pro Thr Val Pro 
165 170 175 

Asp Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu Val Glu Gly Tyr 
180 185 190 



Thr Val Asp Val Lys Asn Lys Arg Thr Phe Leu Ser Pro Trp He Ser 
35 195 200 205 

Asn He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys Ser Ser Pro Glu 

210 215 220 

40 Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe Glu Lys Val Thr 

225 230 235 240 



Gly Arg He Asp Lys Asn Val Ser Pro Glu Ala Arg His Pro Leu Val 

245 250 255 

Ala Ala Tyr Pro He Val His Val Asp Met Glu Asn lie He Leu Ser 

260 265 270 



Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser Glu Thr Arg Thr 

50 275 280 285 

lie Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr Ser Glu Val His 
290 295 300 

55 Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He Gly Gly Ser Val 

305 310 315 320 

Ser Ala Gl^' Phe 3sr Asn Ser hsr. Ser Ser Thr Val AAa He Asp His 

325 330 335 

60 Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu Thr Met Gly Leu 
340 345 350 



Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He Arg Tyr Val Asn 

355 360 365 

Thr Gly Thr Ala Pro He Tyr Asn Val Leu Pro Thr Thr Ser Leu Val 

370 375 380 

Leu Gly Lys Asn Gin Thr Leu Ala Thr He Lys Ala Lys Glu Asn Gin 
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385 390 395 400 

Leu Ser Gin He Leu Ala Pro Asn Asn Tyr Tyr Pro Ser Lys Asn Leu 
405 410 415 

Ala Pro He Ala Leu Asn Ala Gin Asp Asp Phe Ser Ser Thr Pro He 
420 425 430 

Thr Met Asn Tyr Asn Gin Phe Leu Glu Leu Glu Lys Thr Lys Gin Leu 
435 440 445 

Arg Leu Asp Thr Asp Gin Val Tyr Gly Asn He Ala Thr Tyr Asn Phe 
450 455 460 

Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
465 470 475 480 

Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 
485 490 495 

Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
500 505 510 

Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
515 520 525 

He Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
530 535 540 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
545 550 555 560 

Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
565 570 575 

Val Leu Asp Lys He Lys Leu Asn Ala Lys Met Asn He Leu He Arg 
580 585 590 

Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
595 600 605 

Glu Ser Val val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 

610 615 620 

Glu Glv Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 

625 630 635 640 

Gly Tyr He Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
645 650 655 

Asn Asp Arq Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
660 665 670 

Lvs Thr Phe He Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 
675 680 6H5 

lie Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
690 695 700 

Asn Thr lie lie Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Glv 
705 710 715 723 

Hp Lvs Lys He Leu Lys Lys Val Val Leu Gly Lys Lys Gly Asp Thr 
725 730 735 

Va - GLu Leu Thr Cys Thr Ala Ser Gin Lys Lys Ser I e Gin Phe H:.3 
740 745 750 

Trp Lys Asn Ser Asn Gin lie Lys He Leu Gly Asn Gin Gly Ser Phe 
755 760 765 
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• Leu Thr Lys Gly Pro Ser Lys Leu Asn Asp Arg Ala Asp Ser Arg Arg 

770 775 780 

Ser Leu Trp Asp Gin Gly Asn Phe Pro Leu lie lie Lys Asn Leu Lys 

5 785 790 795 800 

He Glu Asp Ser Asp Thr Tyr He Cys Glu Val Glu Asp Gin Lys Glu 

805 810 815 

10 Glu Val Gin Leu Leu Val Phe Gly Leu Thr Ala Asn Ser Asp Thr His 

820 825 830 

Leu Leu Gin Gly Gin Ser Leu Thr Leu Thr Leu Glu Ser Pro Pro Gly 

835 840 845 

Ser Ser Pro Ser Val Gin Cys Arg Ser Pro Arg Gly Lys Asn He Gin 

850 855 860 



15 



Gly Gly Lys Thr Leu Ser Val Ser Gin Leu Glu Leu Gin Asp Ser Gly 
20 865 870 875 880 

Thr Trp Thr Cys Thr Val Leu Gin Asn Gin Lys Lys Val Glu Phe Lys 
885 890 895 

25 He Asp He Val Val Leu Ala 
900 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; Q amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE; 

(A) NAME /KEY : Peptide 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /label = PAHIV 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Ser Gin Asn Tyr Pro Val Val Gin 
1 5 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 12 

(D) OTHER INFORMATION: /label = PAHIV- 1 



(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Gin Val Ser Gin Asn Tyr Pro lie Val Gin Asn lie 
15 10 

'2 N INFORMATION FOR SEQ 10 NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracie 

(ix) FEATURE: 
5 (A) NAME/KEY: Peptide 

(B) LOCATION: 1. .12 

(D) OTHER INFORMATION: /label= PAHIV-2 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Asn Thr Ala Thr lie Met Met Gin Arg Gly Asn Phe 
1 5 10 

15 (2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE : amino acid 

20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
25 (iii) HYPOTHETICAL: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 
30 (A) ORGANISM; Bacillus anthracis 

(ix) FEATURE: 

(A) NAME /KEY: Peptide 

(B) LOCATION: 1..12 

35 (D) OTHER INFORMATION: / label = PAHIV-3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l6: 

40 Thr Val Ser Phe Asn Phe Pro Gin lie Thr Leu Trp 

1 5 10 

(2) INFORMATION FOR SEQ ID NO: 17: 

45 (i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

55 (v) FRAGMENT TYPE: internal 

«:/..- ORI3INSL SOURCE: 

(A) ORGANISM: Bacillus anthracas 

60 (ix) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 13 

(D) OTHER INFORMATION : /labels PAHIV-4 

3b 

(x:.) SEQUENCE DESCRIPTION: SEQ ID NO.17; 

Gly Gly Ser Ala Phe Asn Phe Pro lie Val Met Gly Gly 
1 5 10 
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10 



15 



20 



30 



35 



40 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

<ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 3 . .44 

(D) OTHER INFORMATION: /product= "Primer 1A M 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



CG CAA GTA TCA CAA AAT TAT CCG ATC GTG CAA AAC ATA CTG CAG 44 
25 Gin Val Ser Gin Asn Tyx Pro He Val Gin Asn He Leu Gin 

15 10 



G 45 
(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Gin Val Ser Gin Asn Tyr Pro He Val Gin Asn He Leu Gin 
15 10 

(2) INFORMATION FOR SEQ ID NO: 20: 

45 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

5 5 (iv) ANTI- SENSE: YES 

(vi) ORIGINAL SOURCE 

(A) ORGANISM: Bac-llus an\;hra::i3 

6 0 (ix) FEATURE: 

(A) NAME/ KEY : mi sc_f eature 

(B) LOCATION: 1 . .46 

(D) OTHER INFORMATION: /product " PRIMER IB' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GTTCCTGCAG TATGTTTTGC ACGATCGGAT AATTTTGTGA TACTTG 



46 



WO 94/18332 



PCTWS94/01624 



107 



10 



15 



20 



30 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME/KEY: CDS 
<B) LOCATION: 3.-44 

(D) OTHER INFORMATION: /product^ "Primer 2A" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 



CG AAC ACT GCC ACT ATC ATG ATG CAA CGT GGT AAT TTT CTG CAG 44 
25 Asn Thr Ala Thr He Met Met Gin Arg Gly Asn Phe Leu Gin 

15 10 



45 



(2) INFORMATION FOR SEQ ID NO: 22: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 14 amino acids 
35 (B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Asn Thr Ala Thr He Met Met Gin Arg Gly Asn Phe Leu Gin 
1 5 10 

45 (2) INFORMATION FOR SEQ ID NO: 23; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE : nucleic acid 
50 (C) STRANDEDNE3S : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

55 Uii) HYPOTHETICAL: NO 

;iv &NTI - SENSE : YES 

(vi) ORIGINAL SOURCE: 
GO (A) ORGANISM: Bacillus anthracis 

(ix) FEATURE : 

(A) NAME /KEY : misc_f eature 
(3) LOCATION: 1 . -46 
65 (D) OTHER INFORMATION: /product = "PRIMER 2B " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
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GTCCCTGCAG AAAATTACCA CGTTGCATCA TGATAGTGGC AGTGTT 46 
(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: UNA (genomic) 
(iii) HYPOTHETICAL: NO 



15 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME/KEY: CDS 
20 (B) LOCATION: 3. .44 

(D) OTHER INFORMATION: /product= "Primer 3A n 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: 

CG ACT GTC TCT ITT AAC TTC CCG CAA ATC ACQ CTT TGG CTG CAG 44 
Thr Val Ser Phe Asn Phe Pro Gin He Thr Leu Trp Leu Gin 
1 5 10 

30 G 45 

(2) INFORMATION FOR SEQ ID NO: 25: 

35 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

Thr Val Ser Phe Asn Phe Pro Gin He Thr Leu Trp Leu Gin 
45 l 5 10 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS : 
50 (A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

55 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL; NC 
(ivj ANTI- SENSE: YES 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

(ix) FEATURE: 

(A) NAME /KEY : mi sc_f eature 

(B) LOCATION; 1 . .46 

(D) OTHER INFORMATION: /product^ "PRIMER 3B ,T 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
GTCCCTGCAG CCAAAGCGTG ATTTGCGGGA AGTTAAAAGA GACAGT 46 
5 (2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 (iii) HYPOTHETICAL: NO 

(vi> ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

20 (ix) FEATURE; 

(A) NAME/KEY: CDS 

(B) LOCATION: 3 . .47 

(D) OTHER INFORMATION: /product = "Primer 4A" 



25 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 



CG GGC GGT TCT GCC TTT AAC TTC CCG ATC GTC ATG GGA GGT CTG CAG 47 
Gly Gly Ser Ala Phe Asn Phe Pro lie Val Met Gly Gly Leu Gin 
30 -1 5 10 15 



48 



35 (2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 
40 (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO;28: 

Gly Gly Ser Ala Phe Asn Phe Pro He Val Met Gly Gly Leu Gin 
1 5 10-15 

5 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 49 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



15 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: YES 

20 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracis 

( ix) FEATURE : 

(A) NAME/KEY: misc feature 
25 (B) LOCATION: 1..49 

(D) OTHER INFORMATION: /product^ n PRIMER 4B» 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
GTCCCTGCAG ACCTCCCATG ACGATCGGGA AGTTAAAGGC AGAACCGCC 49 
(2) INFORMATION FOR SEQ ID NO: 30: 



35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

45 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Bacillus anthracie 

( ix) FEATURE : 

(A) NAME/KEY: CDS 
50 (3) LOCATION: 1..2J51 

(D» OTHER INFORMATION: /product^ , 'PAHIV^2 11 



WO 94/18332 PCT/US94/01624 

112 

CCA TGG ATT TCT AAT ATT CAT GAA AAG AAA GGA TTA ACC AAA TAT AAA 672 
Pro Trp He Ser Aen He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys 
210 215 220 

5 TCA TCT CCT GAA AAA TGG AGC ACG GCT TCT GAT CCG TAC AGT GAT TTC 720 

Ser Ser Pro Glu Lys Trp Ser Thr Ala Ser Aep Pro Tyr Ser Asp Phe 
225 230 235 240 

GAA AAG GTT ACA GGA CGG ATT GAT AAG AAT GTA TCA CCA GAG GCA AGA 768 
10 Glu Lys Val Thr Gly Arg He Asp Lys Asn Val Ser Pro Glu Ala Arg 

245 250 255 

CAC CCC CTT GTG GCA GCT TAT CCG ATT GTA CAT GTA GAT ATG GAG AAT 816 
His Pro Leu Val Ala Ala Tyr Pro He' Val His Val Asp Met Glu Asn 
15 260 265 270 

ATT ATT CTC TCA AAA AAT GAG GAT CAA TCC ACA CAG AAT ACT GAT AGT 864 
He lie Leu Ser Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser 
275 280 285 

20 

GAA ACG AGA ACA ATA AGT AAA AAT ACT TCT ACA AGT AGG ACA CAT ACT 912 
Glu Thr Arg Thr He Ser Lys Asn Thr Ser Thr Ser Arg Thr His Thr 
290 295 300 

25 AGT GAA GTA CAT GGA AAT GCA GAA GTG CAT GCG TCG TTC ITT GAT ATT 960 

Ser Glu Val His Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He 
305 310 315 320 

GGT GGG AGT GTA TCT GCA GGA TTT AGT AAT TCG AAT TCA AGT ACG GTC 1008 
3 0 Gly Gly Ser Val Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val 

325 330 335 

GCA ATT GAT CAT TCA CTA TCT CTA GCA GGG GAA AGA ACT TGG GCT GAA 1056 
Ala He Asp His Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu 
35 340 345 350 

ACA ATG GGT TTA AAT ACC GCT GAT ACA GCA AGA TTA AAT GCC AAT ATT 1104 
Thr Met Gly Leu Asn Thr Ala Asp Thr Ala Arg Leu Asn Ala Asn He 
355 360 365 



40 

AGA TAT GTA AAT ACT GGG ACG GCT CCA ATC TAC AAC GTG TTA CCA ACG 1152 

Arg Tyr Val Asn Thr Gly Thr Ala Pro lie Tyr Asn Val Leu Pro Thr 
370 375 380 

45 ACT TCG TTA GTG TTA GGA AAA AAT CAA ACA CTC GCG ACA ATT AAA GCT 1200 

Thr Ser Leu Val Leu Gly Lys Asn Gin Thr Leu Ala Thr He Lys Ala 
385 390 395 400 

AAG GAA AAC CAA TTA AGT CAA ATA CTT GCA CCT AAT AAT TAT TAT CCT 124 8 

5 0 Lys Glu Asn Gin Leu Sex" Gin He Leu Ala Pre Asn Ash Tyr Tyr Pro 

405 410 415 

TCT AAA AAC TTG GCG CCA ATC GCA TTA AAT GCA CAA GAC GAT TTC AGT 12 3 6 

Ser Lys Asn Leu Ala Pro He Ala Leu Asn Ala Gin Asp Asp Fhe Ser 
55 420 425 430 

TCT ACT CCA ATT ACA ATG AAT TAC GGG AAT ATA OCA ACA TAC AAT TTT 154 4 

Ser Tin Pro He Thr Met: Asn Tyr Gly Asn He Ala Til- Tyr Asn Phe 
435 440 445 



60 



GAA AAT GGA AGA GTG AGG GTG GAT ACA GGC TCG AAC TCG AGT GAA GTG 139 2 

Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
450 455 460 

TTA CCC CAA ATT CAA GAA ACA ACT GCA CGT ATC ATI* TCT AAT GGA AAA 14 1 ) 

Leu Fro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lye 
465 470 475 46C 



GAT TTA AAT CTG GTA GAA AGG CGG ATA GCG GCG GTT AAT CCT AGT GAT 



1488 
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Asp Leu Asn Leu Val Glu Arg Arg lie Ala Ala Val Asn Pro Ser Asp 
485 490 495 

CCA TTA GAA ACG ACT AAA CCG GAT ATG ACA TTA AAA GAA GCC CTT AAA 1536 
5 Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lys Glu Ala Leu Lys 
500 505 510 

ATA GCA TTT GGA TTT AAC GAA CCG AAT GGA AAC TTA CAA TAT CAA GGG 1584 
lie Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
10 515 520 525 

AAA GAC ATA ACC GAA TTT GAT TTT AAT TTC GAT CAA CAA ACA TCT CAA 1632 

Lys Asp lie Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
530 535 540 

15 

AAT ATC AAG AAT CAG TTA GCG GAA TTA AAC GCA ACT AAC ATA TAT ACT 1680 

Asn lie Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
545 550 555 560 

20 GTA TTA GAT AAA ATC AAA TTA AAT GCA AAA ATG AAT ATT TTA ATA AGA 1728 

Val Leu Asp Lys lie Lys Leu Asn Ala Lys Met Asn He Leu He Arg 
565 570 575 

GAT AAA CGT TTT CAT TAT GAT AGA AAT AAC ATA GCA GTT GGG GCG GAT 1776 
25 Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
580 585 590 

GAG TCA GTA GTT AAG GAG GCT CAT AGA GAA GTA ATT AAT TCG TCA ACA 1824 
Glu Ser Val Val Lys Glu Ala Hie Arg Glu Val He Asn Ser Ser Thr 
30 595 600 605 

GAG GGA TTA TTG TTA AAT ATT GAT AAG GAT ATA AGA AAA ATA TTA TCA 1872 
Glu Gly Leu Leu Leu Asn He Asp Lys Asp He Arg Lys He Leu Ser 
610 615 620 

GGT TAT ATT GTA GAA ATT GAA GAT ACT GAA GGG CTT AAA GAA GTT ATA 1920 
Gly Tyr He Val Glu lie Glu Asp Thr Glu Gly Leu Lys Glu Val He 
625 630 635 640 



35 



40 AAT GAC AGA TAT GAT ATG TTG AAT ATT TCT AGT TTA CGG CAA GAT GGA 1968 

Asn Asp Arq Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
645 650 655 

AAA ACA TTT ATA GAT TTT AAA AAA TAT AAT GAT AAA TTA CCG TTA TAT 2016 
45 Lys Thr Phe lie Asp Phe Lys Lys Tyr Asn Asp Lys Leu Pro Leu Tyr 
660 665 670 

ATA AGT AAT CCC AAT TAT AAG GTA AAT GTA TAT GCT GTT ACT AAA GAA 2064 
lie Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 

50 675 680 685 

AAC ACT ATT ATT AAT CCT AGT GAG AAT GGG GAT ACT AGT AC'C AAC GGG 2112 
Asn Thr He He Asn Pro Ser Glu Asn Gly Asp Thr Ser Tor Asn Gly 
69C 695 700 



55 



ATC AAG AAA ATT TTA ATC TTT TCT AAA AAA GGC TAT GAG ATA GGA 2157 

lie Lve- Ly.s lie Lau lie Phe Ser Lys Lys Gly Tyr Gi j lie Civ 
705 710 715 



60 TAA 



2160 



(2) INFORMATION FOR SEQ ID NO: 31: 

Gh a, SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 719 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION; SEQ ID KO:31: 

5 Glu Val Lys Gin Glu Asn Arg Leu Leu Asn Glu Ser Glu Ser Ser Ser 
15 10 15 

Gin Gly Leu Leu Gly Tyr Tyr Phe Ser Asp Leu Asn Phe Gin Ala Pro 
20 25 30 

10 

Met Val Val Thr Ser Ser Thr Thr Gly Asp Leu Ser He Pro Ser Ser 
35 40 45 

Glu Leu Glu Asn He Pro Ser Glu Asn Gin Tyr Phe Gin Ser Ala He 
15 50 55 60 

Trp Ser Gly Phe He Lys Val Lys Lys Ser Asp Glu Tyr Thr Phe Ala 
65 70 75 AO 

2 0 Thr Ser Ala Asp Asn His Val Thr Met Trp Val Asp Asp Gin Glu Val 

85 90 95 

He Asn Lys Ala Ser Asn Ser Asn Lys He Arg Leu Glu Lys Gly Arg 
100 105 HO 

25 

Leu Tyr Gin He Lys He Gin Tyr Gin Arg Glu Asn Pro Thr Glu Lys 
115 120 125 

Gly Leu Asp Phe Lys Leu Tyr Trp Thr Asp Ser Gin Asn Lys Lys Glu 
30 130 135 140 

Val He Ser Ser Asp Asn Leu Gin Leu Pro Glu Leu Lys Gin Lys Ser 
145 150 155 160 

Ser Asn Thr Ala Thr He Met Met Gin Arg Gly Asn Phe Leu Gin Gly 
35 165 170 175 

Pro Thr Val Pro Asp Arg Asp Asn Asp Gly He Pro Asp Ser Leu Glu 
180 185 190 

40 Val Glu Gly Tyr Thr Val Asp Val Lys Asn Lye Arg Thr Phe Leu Ser 
195 200 205 

Pro Trp He Ser Asn He His Glu Lys Lys Gly Leu Thr Lys Tyr Lys 
210 215 220 

45 

Ser Ser Pro Glu Lys Trp Ser Thr Ala Ser Asp Pro Tyr Ser Asp Phe 
225 230 235 240 

Glu Lys Val Thr Gly Arg He Asp Lys Asn Val Ser Pro Glu Ala Arg 

50 245 250 255 

His Pre Leu Val Ala Ala Tyr Pro lie Val His Val Asp Mot Glu Asn 
260 26 : 3 270 

55 He He Leu Ser Lys Asn Glu Asp Gin Ser Thr Gin Asn Thr Asp Ser 
275 280 285 

Glu rhi Arg Thr He Sen Lys Asn Tnr Ser Thr Ser Arg Thr Hit Thr 
290 295 300 

60 

Ser Glu Val His Gly Asn Ala Glu Val His Ala Ser Phe Phe Asp He 
305 310 315 320 

Gly Gly Ser Val Ser Ala Gly Phe Ser Asn Ser Asn Ser Ser Thr Val 
65 325 330 335 

Ala He Asp Has Ser Leu Ser Leu Ala Gly Glu Arg Thr Trp Ala Glu 
340 345 350 
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Thr Met Gly Leu Asn Thr Ala Asp Thr Ala Arg Leu Aen Ala Asn lie 
355 360 365 

Arg Tyr Val Asn Thr Gly Thr Ala Pro He Tyr Asn Val Leu Pro Thr 
370 375 380 

Thr Ser Leu Val Leu Gly Lys Asn Gin Thr Leu Ala Thr He Lys Ala 
385 390 395 400 

Lys Glu Asn Gin Leu Ser Gin He Leu Ala Pro Asn Asn Tyr Tyr Pro 
405 410 415 

Ser Lys Asn Leu Ala Pro He Ala Leu Asn Ala Gin Asp Asp Phe Ser 
420 425 430 

Ser Thr Pro He Thr Met Asn Tyr Gly Asn He Ala Thr Tyr Asn Phe 

435 440 445 

Glu Asn Gly Arg Val Arg Val Asp Thr Gly Ser Asn Trp Ser Glu Val 
450 455 460 

Leu Pro Gin He Gin Glu Thr Thr Ala Arg He He Phe Asn Gly Lys 
465 470 475 480 

Asp Leu Asn Leu Val Glu Arg Arg He Ala Ala Val Asn Pro Ser Asp 
485 490 495 

Pro Leu Glu Thr Thr Lys Pro Asp Met Thr Leu Lye Glu Ala Leu Lys 
500 505 510 

lie Ala Phe Gly Phe Asn Glu Pro Asn Gly Asn Leu Gin Tyr Gin Gly 
515 520 525 

Lys Asp He Thr Glu Phe Asp Phe Asn Phe Asp Gin Gin Thr Ser Gin 
530 535 540 

Asn He Lys Asn Gin Leu Ala Glu Leu Asn Ala Thr Asn He Tyr Thr 
545 550 555 560 

Val Leu Asp Lys He Lys Leu Asn Ala Lys Met Asn He Leu He Arg 
565 570 575 

Asp Lys Arg Phe His Tyr Asp Arg Asn Asn He Ala Val Gly Ala Asp 
580 585 590 

Glu Ser Val Val Lys Glu Ala His Arg Glu Val He Asn Ser Ser Thr 
595 600 605 

Glu Gly Leu Leu Leu Asn He Asp Lys Asp lie Arg Lys He Leu Ser 
610 615 620 

Glv Tyr IIg Val Glu He Glu Asp Thr Glu Gly Leu Lys Glu Val He 
625 630 6^5 640 

Asn Asd Arg Tyr Asp Met Leu Asn He Ser Ser Leu Arg Gin Asp Gly 
645 650 655 

Lv;> Thr Phe He Asp Phe Lyr; Lys Tyr Asn Afro Lys Leu Pr ? Leu Tyr 
660 665 Ci'?0 

He Ser Asn Pro Asn Tyr Lys Val Asn Val Tyr Ala Val Thr Lys Glu 
675 6G0 685 

Asn Thr He lie Asn Pro Ser Glu Asn Gly Asp Thr Ser Thr Asn Gly 
690 695 700 

Tie- Lys Lys He Leu He Phe Ser Lys Lyp G?y Tyr Glu He Gly 
705 710 715 



• t 
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WHAT IS CLAIMED IS : 

1. A nucleic acid encoding a fusion protein, 
comprising a nucleotide sequence encoding the anthrax 

5 protective antigen (PA) binding domain of the native anthrax 
lethal factor (LF) protein and a nucleotide sequence encoding 
an activity inducing domain of a second protein. 

2. The nucleic acid of claim 1, wherein the second 
10 protein is a toxin. 

3. The nucleic acid of claim 2, wherein the toxin is 
Pseudomonas exotoxin A. 

15 4. The nucleic acid of claim 2, wherein the toxin is 

the A chain of Diphtheria toxin. 

5. The nucleic acid of claim 2, wherein the toxin is 
shiga toxin. 

20 

6. The nucleic acid of claim 1, comprising the 
nucleotide sequence defined in the Sequence Listing as SEQ ID 
NO: 5. 

25 7. The nucleic acid of claim 1, comprising the 

nucleotide sequence defined in the Sequence Listing as SEQ ID 
NO : 6 . 

8. A protein encoded by the nucleic acid of claim 1. 

30 

9. A vector comprising the nucleic acid c;f 3laim l. 

10. The vector of claim 9 in a host: capable of 
expressing the protein encoded by the nucleic acid. 

35 

11. A nucleic acid encoding a fusion protein, the 
nucleic acid comprising a nucleotide sequence encoding the 
translocation domain and anthrax lethal factor (LF) binding 
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domain of native anthrax protective antigen (PA) protein and 
nucleotide sequence encoding a ligand domain which 
specifically hinds a cellular target, 

12. The nucleic acid of claim 11, wherein the ligand 
domain specifically binds to an HIV protein expressed on the 
surface of an HIV-infected cell. 

13. The nucleic acid of claim 11, wherein the ligand 
domain is a growth factor. 

14. The nucleic acid of claim 11, wherein the 
nucleotide sequence encoding the translocation domain and LF 
binding domain of the native PA protein further comprises the 
nucleotide sequence encoding the remainder of the native PA 
protein. 

15. A protein encoded by the nucleic acid of claim 

11. 

16. A vector comprising the nucleic acid of claim 11 

17. The vector of claim 16 in a host capable of 
expressing the protein encoded by the nucleic acid. 

18. A method of killing a tumor cell in a subject, 
the method comprising the steps of: 

a) administering to the subject a first fusion 
pro-ein comprising the translocation domain and LF binding 
domain of the native PA protein and a tumor cell specific 
ligand domain in an amount sufficient to bind to a tumor cell 
and 

b) administering to the subject a second fusion 
protein comprising the PA binding domain of the native LF 
protein and a cytotoxic domain of a non-LF protein in an 
amount sufficient to bind to the first protein, whereby the 
second protein is internalized into the tumor cell and kills 
the tumor cell . 
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19- A method of killing HIV-infected cells in a 
subject, the method comprising the steps of: 

a) administering to the subject a first fusion 
protein comprising the translocation domain and LF binding 
domain of the native PA protein and a ligand domain that 
specifically binds to an HIV protein expressed on the surface 
of an HIV-infected cell in an amount sufficient to bind to an 
HIV-infected cell; and 

b) administering to the subject a second fusion 
protein comprising the PA binding domain of the native LF 
protein and a cytotoxic domain of a non-LF protein in an 
amount sufficient to bind to the first protein, whereby the 
second protein is internalized into the HIV-infected cell and 
kills the HIV-infected cell, thereby preventing propagation of 
HIV. 

20. A method for delivering an activity to a cell 
comprising the steps of: 

a) administering to the cell a protein comprising 
the translocation domain and the LF binding domain of the 
native PA protein and a ligand domain; and 

b) administering to the cell a compound comprising 
the PA binding domain of the native LF protein chemically 
attached to an activity inducing moiety, whereby the compound 
administered in step b) is internalized into the cell and 
effects the activity within the cell. 

21. The method of claim 20, wherein the ligand domain 
is the receptor binding domain of the native PA protein . 

22. The method of claim 20, wherein the activity 
inducing moiety is a polypeptide. 

23. The method of claim 22, wherein the polypeptide 
is a growth factor. 



24. The method of claim 20. wherein the activity 
inducing moiety is an antisense nucleic acid. 
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25. The method of claim 20, wherein the activity 
inducing moiety is a nucleic acid encoding a desired gene 
product . 

5 26. A compound comprising the PA binding domain of 

the native LF protein chemically attached to a non-LF activity 
inducing moiety. 

27. The composition of claim 26, wherein the activity 
10 inducing moiety is a polypeptide. 

28. The composition of claim 26, wherein the activity 
inducing moiety is a radioisotope. 

15 29. The composition of claim 26, -wherein the activity 

inducing moiety is an antisense nucleic acid. 

30. The composition of claim 26, wherein the activity 
inducing moiety is a nucleic acid encoding a desired gene 

20 product. 

31. The nucleic acid of claim 11, comprising the 
nucleotide sequence defined in the Sequence Listing as SEQ ID 
NO: 11. 

25 

32. A nucleic acid comprising a nucleotide sequence 
encoding an anthrax protective antigen which is altered to 
include a cleavage site recognized by a protease produced by 
an intracellular pathogen. 

30 

33. The nucleic acid of claim 32 wherein the 

intracellular pathogen is a virus. 

34. The nucleic acid of claim 33 wherein the 

35 alteration comprises a mutation in at least one of amino acid 
residues 164-167 (the trypsin cleavage site , 
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35. The nucleic acid of claim 34 wherein the virus is 
a retrovirus. 

36. The nucleic acid of claim 35 wherein the 
retrovirus is an HIV. 

37. The nucleic acid of claim 36 wherein the amino 
acids at residues 164-167 are replaced with an amino acid 
sequence selected from the group comprising NTATIMMQRGNF , 
QVSQNYPIVQNI , TVS FNF PQ ITLW , and GGSAFNFPIVMGG. 

38. A polypeptide comprising an amino acid sequence 
encoding an anthrax protective antigen which is altered to 
include a cleavage site recognized by a protease produced by a 
retrovirus . 

39. The polypeptide of claim 38 wherein the 
alteration comprises a mutation in at least one of amino acid 
residues 164-167 (the trypsin cleavage site) . 

40. The polypeptide of claim 39 wherein the 
retrovirus is an HIV. 

41. The polypeptide of claim 40 wherein the amino 
acid residues 164-167 are replaced with an amino acid sequence 
selected from the group comprising NTATIMMQRGNF, QVSQNYPIVQNI , 
TVSFNFPQITLW, and GGSAFNFPIVMGG . 

42. A method cf killing a cell which is infected with 
an intracellular pathogen, the method comprising: 

applying to the cell a composition comprising an 
effective amount: an a:.tered anthrax protective antigen (PA) 
having a cleavage site recognized by a protease produced by 
the intracellular pathogen. 

43. The method of claim 42 wherein the cleavage sice 
is a~ amino acid residues 164-167. 
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44. The method of claim 42 wherein the intracellular 
pathogen is a virus. 

45. The method of claim 44 wherein the virus is a 
5 retrovirus. 

46. A method of claim 45 wherein the retrovirus is an 

HIV. 

10 47. The method of claim 46 wherein the amino acids at 

residues 164-167 are replaced with an amino acid sequence 
selected from the group comprising NTATIMMQRGNF , QVSQNYPIVQNI , 
TVSFNFPQITLW, and GGSAFNFPIVMGG. 

15 48. The method of claim 42 wherein the cell is 

harbored in a human. 

49. The method of claim 4 8 wherein the step of 
applying the composition includes parenterally administering 

20 the composition to the human. 

50. The method of claim 49 wherein the parenteral 
administration is intravenous. 

25 51. The method of claim 4 8 wherein the effective 

amount of altered protective antigen is from about 5 to about 
2 5 micrograms per kilogram of body weight of a human harboring 
the infected cell. 

30 52. The method of claim 51 wherein the effective 

amount of altered protective antigen is about 10 micrograms 
per kilogram of body weight cf a human harboring the infected 
cell . 
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Figure 1 



Cleavage of mutant PAHIV proteins with purified HIV-1 protease 
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